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Each and any viewer of a video or a television scene is his 
or her own proactive editor of the scene, having the ability 
to interactively dicute and select— in advance of the unfold- 
ing of the sceoc and by high-level comnaand— a particular 
perspective by which the scene will be depicted, as and 
when the scene unfolds. \^deo images of the scene are 
selected, or even synthesized, in response no a viewer- 
selected (i) spatial perspective on the scene, (ii) static ot 
dynamic object i^>pearing in the scene, or (iii) event depicted 
in the scene. Multiple video cameras, each at a different 
spatial location, produce multiple two-dimensional video 
images of the real-world scene, each at a different spatial 
perspective. Objects of interest in the scene arc identified 
and classified by con^ter in these two-dimensional images. 
The two-dimensional images of the scene, and accompany- 
ing infonnatlon. are then combined io the cotDputcx into a 
three-dimensional video database, or model, of die scene. 
The con^)Uter also receives a user/viewcr-specificd criterion 
relative to which criterion the usa/vicwer wishes to view tiie 
scene. From the (i) model and (ii) the criterion, the computer 
produces a particular two-dimensional image of the scene 
that is in **besr accordance with the uscr/vicwef-spcdfied 
criterion. This particular two-dimensional image of the 
scMie is then displayed on a video display. R-om its knowl- 
edge of the scene and of the objects and the events therein, 
the comput«a- may also answer usei/viewer-posed questions 
regarding the scene and its objects and events. 

3d dauos, 17 Drawing Sheets 
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— WHO IS THIS PLAYER? ^ HE IS J. WASHINGTON 

— WHERE IS J. WASHINGTON? ^ J. WASHINGTON IS HERE. 

FIG. 6 
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FIG. 9b 
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MArmrnr dynamic SELECTION OF ONE interactive ("MPT) video. MPI video wiU overcome severed 

TOOM MmriPLE VIDEO CAMERAS/ Weody E. Mackay and Glonanna Davcnpwl; •Virtual video 

^^X^^^AC^^CK editing >-.^jr^\^rS^l^S^jT^-2) 

oSE?^TS?fm scare inplementation of a video^jert database system" submit- 

OR AN EVENT ON THI!, t^HMIL -V ^ ^ Transactions on Knowledge and 

BACKGROUND OF THE INVENTION Engineering; 3) Qorianna Davenport, Thomas Agu- 

1 Held of the Invention W icre Smith, and NataUo Pincever; "Cinematic pranitives for 
n,e present invention genaaUy concen.s (i) multimedia, muWmedia" appeaimg in 'EEE Compter Gn.ph^s & 

(ii) v^dS^induding vid«;on-delnd and Interactive video, AppUcatUms, pages 67-74 July 1991; and 4) Anderson H. 
^r(S tdtisiol including tdevision-on^d and Ga^;; EMting '-t^'tC£ZTl9J"'^ 
interactive television. G«i&, Knowledge Industry Pubhcations 1988 

The wesenl invention particularly coDoems automated " MPI video will also be seen to support the editing of, and 

dyoamirselectionof oncvidcocamera/imagefcommultiple viewer interaction with, video and television m a mnner 

real video camoasflmages In acca-dance witii a particular tiiat is useful in viewmg activities ranging 6om education to 

perspective, an object in the scene, or an event in the video entertainment In particular, in conventional video, vwwms 

p^pecave. an lu ui , substantially passive; aU tb«qr can do is to control die 

TTienresentinventionstillfurthercoQcansthecreationof flow of video by pressing buttons sudi as play, pause, fast 

thSX^S^S^e databases, and the location forwanl or f«t reverse. These <=«r5j«^y P^^^ 

^dZLcal tracking of video images of selected objects the viewer only one choice for « P»^°^.f "J^ 

^d in tiie databases for. among other pmposes, « the yiewrx can either see the video (albeit at a controUable 

selection of a real camera <w image to best show tiie object rale), or skip it ... 
jgjgj^ ^ In ttie case of Uve television taoadcast. viewers have 

Hie present invention still further concern (i) interactive essentially no control at alL A viewer must eitiier see exacUy 

selecting of video, or television, images on demand, (il) the what a hroadcast« chooses to sh<w. or else *ange away 

selection of video images in real time, or television, and/or from Uiat teoadcastex and station. Even in sports and ottier 

(iii) die selection of virtual video imagesAelevision pictures broadcast events wh« multiple cameras are usei a 

toat are linked to any of a particular penpeetive on Ae ^ has no choice except ttie obvious one of either viewing ttoe 

videortelevislon scene, an object in the videoftelevision image presented or else using a remote conttol so as to 

scene, or an event in the video/television scene. "sutT multiple channels. 

2 Description of Uie Prior Art with the availability of increased video bandwidtii due to 
2 1 Limitations in tiw Present Viewing of Video and new satellite and fiber optic video links, and with advances 

-vTJvision in several areas of video technology. It IS die c^iMon of the 

•me traditional model of television and video is based on inventors that tfie time has come to addwss <««a|» 

a skllfwdeo stieam transmitted to a passive views, A <? IT?^ f t™^^-;^^ ^ 

viewCT has the option to watch ttie particular video stieam, systems. IncidentaUy. author George GUd« ^««« 

^Tt^,;lu:hshouldthevideobe^ded,butlitdeelse. « ^^"^f"^ 

Duetotheemergenceoftheinformationhlghwaysandothcr ^'^'^^^'^'^''''^'^^'^^^^^l^^rAuL^e 

stated information infrastmctiire circa 1995, there has been viewer-driven syst«n or device. See Ge«g« CMd«; 

^sidenble interest in concepts like vide<M..Mlemand, "^fX 

interactive movies, interactive TV. and virtual presence. American Ufe, W. W. Norton & Co.. 1994. 

Some of these concepts are exciting, and suggest many The MPI video of flic present invention will be seen to 

dramatic changes in sodety due to the continuing dawning make considerable progress— even by use of cuirenfly exist- 

of tiie infomiation age. ing technology-towards "liberating" video and TV from 

It WiU shortly be seen diat this specification teadics that tiie teaditional single-source, teoadcast model, and lowanis 

a novel form of video, and television, is possible— and has. placing each viewer in his or her own director s scat . 

indeed, akeady been reduced to operative practice in nidi- 50 A three-dimensional video model, or database, used in 

mentary form as of die time of filing— where a viewer <rf mPI video, and certain functions performed by MPI video, 

video, or television, depicting a rcal-wwld scene may select prospectively serve to make MPI video a revolutionary new 

a particular perspective from which perspective die scene media. Tliis three-dimensional model, and Uie functions that 

will henceforOi be presented. The viewer may alternatively it performs, are well and con^etdy understood, and are 

select a particular object-^hich may be a dynamicalty 55 con?)letely taugjjt within this spedScation. Alas, the video 

moving objed or even an event in ttie real wmM scene fliat bandwidUi required for each viewer, and the amount of 

is of particular interest As the scene devdops Its presenta- computational power required are bodi daunting and expen- 

tion to the viewer will prominenUy fealure tiie sdected sive (but realizable) requirements in terms of the coimnu- 

object or tiie selected event (if occurring). nications and computer hardware avaUable drca 1995. 

Accordingly, video presentation of a real-wMld scene in <o About 10* more video data tiian is within a modera televi- 

accordancc with tiie present invention will be seen to be sion channd may usefully be tt^ansmitted to each viewer, 

interactive wifli both (i) a viewer of die scene and. in die case Each view« may usefUly benefit from tiie oonputational 

of a selected dynamically moving object, or an event, in flie power equivalent to several powaful engineering worksta- 

scene. (ii) the scene itself. TVue intaactive video or tdevi- tion con5)uters (circa 1995). Once tills is done, howeva and 

sion is ttius presented to a viewer. M in accordance witii tiic teadiing of tiie present invention and 

The video system, and approadi, described in tiiis sped- present spedflcation, ttie "bounds of earth" are 5he«l, and a 

fication will be seen to be called Multq>le Perspective viewer may interact with any ttiree-dimensional real-world 
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scene much as if he/she were an omnipotent, prescient and in real time* on game day. Subscdbci/vicwer voting may 

being whose vantage point on the scene is unfettered save permit a limited interaction. For example, the "tans'* around 

only that it must be through a two-dimensional video a particular television mig^t select a camera, or synthesis of 

*Vindow" of the viewer's choice. a virtual camera, iHofiling the "defensive backs". HnaUy, 

Rudimentary embodiments of the present invention that 5 and what will undoubtedly transpire only after the lapse of 

are not capable of full synthesis of virtual video— which some years from the present time (1995), it should be 

embodiments only select among real image views and thus possible for each fan to be his ot her own "game director**, 

use lcss» and less expensive, communications and computer and to wtf ch in real time substantially exactly what he or she 

hardware resource — will still be seen to do many useful wants 

things. For example, even the iudin»ntaiy, fost, embodi- m * i* t ♦ - ^ * * *u 

menT of the i^ent Invention (hat is partcutady taught '° AccorduiBty. to «a«se the MH W<teo system of the 

within the piesent specification-whi<rei.*o<ltocnt h f^,^J^Z TT" "f.^T 

aheady fiin^naUy ^^5rativ<s-wiU be seen to do many °J "^"^Tm * confidenUy 

c , J 1 \. . ' i„ 1 Ai. ^1 expected, m me fields or computer vision, multunedia 

useful and novel thmgs m, by example, the particular . » 

y^^tj.v * . \e * database and human interface, 

contextofthe video (and television) presentation of Amen- ««««^^ lUM^iow^ 

can football (in whidi environment the model is exercised). See, fosc cxan^le, Swanbetg: 1) Drfxjrah Swanberg, Terry 

For cxaiiq)le, some particular few football playo-s, and toe Weymouth, and Ramcsh Jain; "Domain information model: 

football itself, will be seen to be susccpal)lc of being an extended data naodel fw insertions and quay" appearing 

automaticaUy ^tracked" during play by the MPI video ^ Pwcedings of the Multimedia Information Systems, 

system in order tiiat a video image presented to a vicwea: by ^ P^«cs ^9-51, intelligent information Systems Laboratay, 

me system may be selectively "keyed** to the action <tf the Arizona State Univcndty, February 1=92; and 2) EMwrah 

g^jQf Swanberg, Chiao-Fe Shu, and Ramesh Jain; ''Architecture of 

According to all the preceding introduction to the context * multimedia information systein for content-based 

of the present invention, the pertinent background to the ^f"^ appwng in An^ Video Workshop, San Diego, 

present invention includes a knowledge of, and sensitivity 25 " November 1992. 

to, the present state of the conq)utcr and communications See, for example, Han^apur 1) Arun Hampapur, Ramcsh 

sciences. A ja-actitioncr of the multi-media arts reading the and Tory Weymouth; "Digital video segmentation** 

present spedfication is expected to be knowledgeable, and appearing in Proceedings of the ACM cor\ference on 

realistic, about both (i) Ac very considerable computer Afw/ft*Af«fwi, Association of Computing Machinery, October 

system resources that are needed, at least in 1995, in OTder 30 ^ 2) Arun Hanopapur, Ramcsh Jain, and Terry 

to exercise the MPI video model of the present invention (i) Weymouth,: "Digital video indexing in multimedia systems** 

in real time, and/or (ii) at mflTimum unfettered versatility to appearing in Proceedings of the Workshop on Indexing and 

cadi and every viewer, as well as (iii) the historically- MulHmedia Systems, American Association of 

demonstrated rapidity in the in^ovement of these Artificial Intelligence, August 1994. 

resources. 35 See, for exanqde, Zhang: 1) H. J. 2iang, A. Kankanhalli, 

The present invention, and the present specification, vwU and S. W. SmoUar, "Automatic partitioning of video** 

be seen to "lay our the method, and system, of MFI video ^jpcaiing in Multimedia Systems, l(l):10-28, 1993; and 2) 

in a hierarchy of compatible embodiments leading all the Hoi^ Jiang Zhang, Yihong Gong, Stqphen W. Smoliar, and 

way to die ultimate implementation of (i) ftill-customized, Shuang Yeo Tan; "Automatic parsing of news video** 

video views and, in real time, (ii) television, images for (iu) 40 apP<»riog in Proceedings of the IEEE Corrference on Mul- 

each and every viewer of a three-dimensional, real-woiid, timedia Computing Systems, May 1994. 

scene (as is simultaneously imaged by multiple video See also, for example, 1) Akio Nagasaka and Yuzuru 

cameras). Progress already achieved towards this ultimate Tanaka; "Automatic video indexing and fuU-video search 

goal will be seen be, it is respectfully suggested, to be more for object appearances" appearing in 2nd Working Confer- 

substantial, more cost effective, and more immediately use- 45 on Visual Database Systems, pages 119-133, Budapest, 

fill than might have been expected. However, progress in Hungary, October 1991. IFIP WG 2.6; 2) Farshid Arman, 

inylementing MPI video beyond the rudimentary system of Arding Hsu, and Ming-Yee Chiu; 'Image processing on 

the present invention is transpiring even as of the date of compressed data for large video databases** appearing in 

filing, and still further progress is imminent Howev^, MPI Proceedings of the ACM MuUiMedia, pages 267-272, 

video will not likely span the gap all the way firom the so California. USA, June 1993. Association of Computing 

rudimentary, first, system taught within the present specifi- Machinery; 3) Glorianna Davenport, Thomas Aguirre 

cation all the way to its ultimate embodiment in a single step. Smith, and Natalio Pinccvcr; op cit; 4) Eitetsu Oomoto and 

Nct need it do so. As the system and method becomes better Katsumi Tanaka, op cit; and 5) AkLhito Akutsu, Yoshinobu 

understood, it will be seen that both offer a logical, and Tonomura, Hideo Hashimoto, and Yuji CHiba; "Video indcx- 

OTderly, progression of useful, and interesting, c^abilities. 55 ing using motion vectors** appearing in Proceedings of 

To die video and television viewing public this is what is SPIE: Visual Communications and Image Processing 92, 

called "progress**. November 1992. 

To continue with the football scenario, a logical "next When considering these references, it should be recalled 

stq7** in deployment of the MPI video of the present inven- that MPI video is already operative, as will be explained and 

tion beyond its rudimentary inq)lementation as is taught 60 shown, right now, and as of the time of filing. Actual results 

within this specification is as non-real-time pre-processed obtained cm the MPI video system will be presented in this 

"game video**. Such a "game video** would likely be specification- The above-stated references to certain 

recorded on the now-cmcriging new-form CD-ROM, where, breaking, stato-of-thc art, developments arc deemed appro- 

for example, twenty-three different 'tracks** would be priate for inclusion within the instant Background of the 

recorded to profile each player on the field from both teams, 63 Invention section of this specification siny)ly because it 

and also the football A "next step** beyond even this will be should be understood that the present invention has a 

to send Ifae same information on twenty-throe channels live, particularly great and likely chronologically very long. 
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"spin-out-. Nonetheless that immediately useful, and argu- cauwas are, and are pointing tt is a siraightfotward mma 
abW^ practical and cost effective, results ore obtainable for the computei processes of tiic present J" ^• 

directlVfrom the MM video system presented within this and to track, .terns in the scene. In ttus manner the invenoon 
specification, the software programming in implementation is a rough optical counterpart and anidog of "he Afl^^c 
rfftcMHvideosystemthat already exists couldprofltfrom 5 Undersea Acoustic Test range for acousUc (sonar) detectton 
(i) aflberopticintaconnectto. and (ii) a computer on thetop classification and tracking, and is litew^e a counteipart «md 
of (or insidO. every television in America. Accordingly, and analog to muUi-antenna oorrekted radars such as m Ae 
wWle the present embodiment of the invention should be Naval -mctical Data System for electromagnetic (radar) 
duly regarded, it will be particularly in^rtant in consider- detection, classification and trackmg. 
ing the present specification to note and understand how the lO The present invention will be seen to perform co-ordinate 
MH video method and system of the present invention Is transfonnation of (video) image data (Le., pixels), and to do 
greatty expandable and extendable in each of (i) (he sophis- ftls during generation of two- and three-dimensional image 
tication of system functions peifwmed. (ii) the speed of databases. U.S. Pat. No. 5^59,037 to Hunk for AUTO- 
systcm peafonnance, and (iii) the breadth of system deploy- MAIED VIDEO IMAGERY DATABASE GENERAnON 

IS USING PHOTOORAMMETRVdiscusses the conversion of 

2 2 SDcciflc Prior Art Concerning Video and Television forward-looking video or motion picture iawgery into a 

^.i spccmc raw ft« ^un » ^jr^rranr, akt, database pailiculariy to suppan image generation of a "top 

U.S. Pat No. 5.109.425 ^^J^^l^^J^^^^ W^v^THe 6rescntinvention dSes not require any 
APPARATUS FOR PREDICIINGTOE DIRECnON OF wLhMc^asthalof Plunk,whousesaKalman 

MOVEMENT IN MAOTINE VISION concerns ^ SSVo c^SSfo^the roU. ^ «nd yaw of the 

tion of motion h» and by a '^^^'^'■l'^'^'^-!^^ SS-nie Platfo™: «» "i^^^- ^ 8e"«al the nec 

network, particularly fcr the motion f.' SS^^iSgTaLformations oftoe present Invention will 
Interestingly, a subsystem of the present Invention wUl be ^e plagued by dynamic considerations (other 

seen to capture the image o a moving 7« w.^ JaT^ JiTff oomTi^Tultiple cameras remain- 

a scene, and to classify the unage c^tured to the rova and ^^^^^^^ imaginTthe scene (in which scene the 

to its movement However, the MPI video system cf the ^^"rfl" " ^ 

present invention, and ite subsystem, will be seen to function objects, however, may be dynamic). 

S^Sentty tUthc methid and apparatus of Lawton in Fmally, U^ '^^T^* v^^rT^^BO 
Hv- drteetion of motion The MPI Video system «rf the RATUS AND METHOD FOR EDmNG A VIDEO 
iLe2^entil^ S;^nrav^ its<^ of multiple ^ RECORDING BY SELBCUNG AND DISPLAYB^G 
^^^sdonal video images from each of muMplc sta- ^ VIDEO CLIPS shows and discusses some of fte cleans 
Z^T^exss as are assc^lcd into a three-dimensional and desired disptays, presented to a human vid» cdi^ to 
SiS^Tuibase (an important element of the present the MPI video system of the present mvention much of this 
SStSf Once the UlSTtaages of the MPI video function wiU be seen to be assumed by har*^are. 
system of the present invention are available for object, and The system of present invention wiU be seen to. m its 
for object track (Le.. motion), coiiclation(s). then it will rudimentary embodiment, paform a spatial positional cab- 
prove a somewhat sin^Jlcr matter to detect motion in the hration of each of multiple video cameras from the images 
MH video system of the present invention than in prior art produced by such cameras because, quite simply, m the 
single-perspective systems such as that of Lawton. initial test data the spatial locations of the cameras wwe 
US Pat No 5 170.440 to Cox for PERCEFTUAL ^ neither conttoUed by, nor even known to. the inventors. This 

G^IpSg iY m'SSSLe HYPOniESIS PROBABI- « i^'^'^r^y^^'^''^.^^^^^:^^ 

TUSnC DATA ASSOCLOTON is a concept of a computer present Invention noimaUy ongmatcs from mutoplc cameras 

Salf^tiSoS^SlvideosyTmof theJSent for which (i) the posiUons. and (ii) the zoom i^^^^ 

Z^S^wS vo be able to stit with much more parameters, are weU known, and i^yP^^^^J^ 

STtion ST^rsingl^poiot machine vision system. system. However, and '"^Wy. prior knowledge of amaa 

Stoat the MPI videfsystem of the present invention posltion(s) may be 'Yeverse engmeered by a system from a 

wSSntTaSitretf rflutipletw<>Lensionalvideo camera(s')image(s).1Vo prior art aiidctes so disoissing this 

iTsTom^dTof ^tiple sti^onary cameras, and that process are "A Camera Calibration T«hmque usmg TTuee 

n Sdimendonal i^es are. moreover. Sets of P«aUel Lines- by 

assembled iito a three-dimensional video image database. ^ Machine V's^^'»^,.{fPj''<^'^JA?^ 

. J , • .-.^^.rf "Atheory of Self-Cahteation of a Moving Camera ljyS>.J. 

Hie general conccpU. and voluimnous pnor art. concern- ^ ^ q q appearing in Imenuoional 

ing '-machine vision", "(target) dassiflcation". and "(target) ""y"*^"!^ w,i^2- 1^-15 U1992) 

tafcking" are aU relevant to the present invention. However, Jounud of Convuter V,sumi.2,Va ^^^^ 

theMHvideosystemof theprescntinvention-^hiledoing In general, many computer process«i P«f J« 

very, very weU in each of viewing, classifying and tracking, 35 present invention »« "o more ^^^J^ ^ *^ 

will be seen to come to these problems from a very different computer proc^ses of the pnor art. but they a^. mvoy 

perspective toan docs the prior art. Namely, toe prior art many ways, often greatly more audaaous. Thcj^ent 

SSplatforms-wheth^toey areroversorwarsh^s- invention will be seen to manage a ^^yjf^l^l'^^ 

are "located in the worid", and toat must make sense of video data. A toree-dimenslonal >*leo model, or database s 

Kr^ew^ereof from essentially but a single perspective « «''«<«^ ^or any suable duraUon of «d« (and 

uivu TT »M ^ y ^ length therectf mav ocrtiaps not have to be retaiacd at all, or 

centered on present locadon^ w 1. -Mefln^ the S StsT^S^ K this database is huge. More 

Thepresentinventionfundionsoppositrfy.It defines the ^ ^ considerable computer "horse- 
world", or atleast somudi of toe worid is «> 5^*S"Tocoi^cttois<Sabase--howsoever long its video 
view to (each oO multiple video cameras. The MPI video P^^^Th^idbs held and used 
system of toe present invention has at its command a 63 ^ ^ "f"; , 
SSa of oLwable and coireUted. simultaneous, posi- However, toe inventors having taken a ™,or multi-me^ 
tional infwmation. Once it is known where each of multiple laboratory at a major university and lushed in where angels 
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fear to tread'" in attempting to develop a form of video tioD. For cxanQ>le« if the userMewer wants to sec how player 

preseotatioD diat is believed to be wholly new, the inventors number twenty (#20) came to make an interception in the 

have found the ^'ground'* under their invention to be firmer, football game, then he or she could order a replay of the 

and the expected problems more tractable, than expected. In entire down focused on player number twenty (#20). 

particular the inventors have found— a few strategic simpli- 5 For cxzmplc^ and continuing with the example of an 

flcations being made — that presently-available computer American football game, an individual viewer can ask 

and counter systems resources can produce usable results questions like: Who is the particular player shown marked 

in an MPI video system. Such is the story of the following by my cursor? Where is player Mr, X? Where is the football? 

5^°°^- In advanced, embodiments of the system of the present 

SUMMARY OF THE INVENTION invention, the usei/viewer can generate commands like: 

'Yeplay for me at speed the event of the fumble. Such 

1. Summary of the Function of the Invention commands are honored by the system of the present inven- 

The present invention contemplates making each and any tion. 

viewer of a video or a television scene to be his or her own 2. Summary of the Method and System In Implementation 

proactive editor of the scene, having the ability to interac- of the Invention 

tivcly dictate and select— in advance of the unfolding of the present invention contemplates selecting real video/ 

scene, and by high-level command-^ particular perspective television images of a scene &om mult^)Ie real vidcoi' 

by which the scene will be dq)icted, as and when the scene television images of the scene, particularly so as to select 

vidcoAclcvision images that arc linked to any such (i) spatial 

The viewer can command the selection of real video ^ perspective(s) on the scene, (ii) object(s) in the scene, or (iii) 

images of the scene in response to any of his or her desired cvent(s) in the scene, as are selectively desired by a user/ 

and selected (i) spatial perspective on the scene, (ii) static or viewer to be shown. 

dynamically moving object appearing in the scene, a (iii) -j^^ method of the invention is directed to presenting to a 

event depicted in the scene. The viewer— any viewer— is user/viewer a particular, viewcr-sdected, two-dimensional 

accordingly considerably more powerful than even the ^^^^ ^ real-wwld, three-dimensional, scene. In 

broadcast video editor of, for exanD?)Ic, a live sporting event ^^j^g^ to do so, multiple video cameras, each at a diffaent 

drca 1995. The viewer Is acc(»ded the ability to (i) select in location, produce multiple two-dimensional images 

advance a preferred video perspective of view as optionally the real-world scene, eadi at a different spatial paspcc- 

may be related to dynamic object movements and/or to ^ Objects of interest in tiie scene are identified and 

events unfcdding in the scene. classified in these two-dimensional images. These mult^le 

For exan^>le, in acocH'danoe with the present invention a two-dimensional images of the scene, and their accompa- 

viewer of an American football game on video or on nying object information, are then combined in a computer 

television can command a consistent **best** view of (i) one into a ttuee-dimensional video database, or model, of the 

particular player, or, alternatively (ii) the football itself as scene. The database is called a modd because it incorporates 

will be, from time to time, handled by many players. The infonnadon about the scene as well as the scene video. It 

system receives and processes multiple \idco views incorporates, fOT exanq>le, a definition* or 'Vc^ld view", of 

(images) generally of the football field, &e football and the the three-dimensional space of die scene. The model of a 

players within the game. The system classifies, tags and football game knows, for example, that the game is played 

tracks objects in the scene, including static objects such as ^ upon a football field replete witt) static, fixed-position, field 

field markers, and dynamically moving objects such as the yard lines and hash mark markings, as well as of the 

football and the football players. Souk of die various views existence of the dynamic objects of play. The model is, it 

(images) will at times, and firom time to time, be *t)etter*' — will be seen, not too hard to construct so long as there are, 

by various criteria — in showing certain things dian arc other or are made to be, sufficient points of reference in the imaged 

views. scene. It is, conversely, almost impossible to construct the 

In the rudimentary embodiment of the invention taught 3-D model, and select or synthesize the chosen image, of an 

within this specification the system will consistently, amorphous scene, such as the d^ths of the open ocean, 

dynamically, select and present a single **besr view of the (Luckily, viewers are generally more interested in people in 

selected object (for exaii^tlc, the football, or a particular the world than in fish.) 

player). This will require, and the system will automatically ^ The computer also receives from a prospective user/ 

accon^lish, a *1ianding off" from one camera to anotha viewer of the scene a usei/viewer-specified criterion relative 

camera as different ones of multiple cameras best serve to to which criterion the user/viewer wishes to view the scene, 

image over time the selected object From die (i) 3-D model and (ii) the criterion, the computer 

Hie system of the invention is powerful (i) in accepting selects a particular two-dimensional image the scene that 
viewer specification at a high level of those particular 55 is in acccvdance with the user/viewer-specified criterion, 
objects and/or events in the scene that the userMewer This particular two-dimensional image of the real-world 
desires to be shown, and (11) to subsequently identify and scene is then displayed on a video display to the user/viewer, 
track all useiMewer-selected objects and events (and still At the highest-level the description of the previous para- 
others for other usersMewers) in the scene. graphs regarding the method of the present invention^ and 

The system of the present invention can also, based on its (c the computer-based system performing the method, may not 

scene knowledge database, serve to answer questions about seem mudi different in effect than that prior ait system 

the scene. I»-esently accorded, say, a n^ork sports director who is 

Finally, the system of the present invention can replay able to select among many video feeds in accordance with 

events in the scene from the same perspective, or from his (or her) own "uscr/viewer-specified criterion". The sig- 

selected new perspectives, depending upon the desires of the €S nificance of the production of the three-dimensional video 

userMewer. It is not necessary for die userMewer to *1ind'' model (of the real- world scene) by the method, and in the 

the best and proper image; the system pcrioims this fiinc- system, of the present invention is, at this highest level of 
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dcscribinK the systcm^s functions, as yet unclear. Consider, will again combine the images from the multiple video 

trSy wh^t flows from the m^od, and the system, cameras not only so as to generate a <»^';^;>^^^^ 

of the cresent invention that produces and uses a three- model of flic scene, but so as to generate a mo^l in whach 

dimensional video model one <a more dynamically occurring event<s) in the scene arc 

First the conmuler may ultimately produce, and the 5 recognized and identified. The computer will subsequently 

display may finally show, only such a particular two- j^oducc, and the display win show, a particul^ image that 

diinensional image of the scene— in acccrdancc with the is appropriate to best show the selected event. Oearly this is 

user/vicwer-spedfied critcrion-nas was originally one of the again a feedback loop: the location of an event In the scene 

images of tiic real-world scene that was directly imaged by influences, in accordance with a viewer selection of the 

one of the multiple video cameras. This is, indeed, the way jq tytat, how the scene is shown. 

the rudimentary embodiment of the invention taught and finally, the method of die invention may be 

shown herein functions. At first consideration, this automatic ^oj^jj^ in real time as interactive television. The televi- 

camera selection may seem unimpressiye. However, con- ^ ^ resented to a user/viewer interactively in 

sidcr not only that tiie usciyvicw« cntciion is specifiable at ^^^^ uscr/viewer-spedfied criterion, 

a hiffh level but that the ainaOTiate, selected, scene un^gc accoroaacc wim u»a/Yitwvi i-^v^ 

Sf^Ete^^^ rSance with^just w^is These and otiiex aspects and attributes of thepr^e^ 

imaged, and in what location(s), by which camera(s), and in invention will become mcrcasmgly clear upon reference to 

accordance with just what transpires in tfie scene. In other the foUowing drawings and accompanymg specification, 

words, the evolving contents of the scene, as the scene is r^r>^ nccr-oTimnM np thf DRAWINGS 

imaged by tiie mumplc cameras and as it is automatically BRIEF DESCRIFnON OF THE DRAWIN05 

interpreted by tiie computer, determine just what image of 20 piQ | ^ top-level block diagram showing the high level 

tiie scene is shown at any one time, and just what sequence ardWlecture of the system for Multiple Perspective Intcrac- 

of images arc shown from time to time, to tiie user/viewer. accadance witii tiie present invention. 

A^onintiiesc^ne^fecdsbacronhowtiiescenetsshown ''''l^^^^^^^^ 

''Scus^Mewer-^^ 25 <>^*^^^^^---^^ 

spatialperspectiverSitivetowhichttieusei/viewerwishes previously seen m block diagram in HG. 1, m use lor 

TTJTi^ scene, TOs spatial perspective need not be interactive footbaU video. 

immutably fixed, but can instead be linked to a dynamic fig. 3 is a diagrarmnatic representation of tiic hardware 

object in tiie scene. In ttie case of selecting a scene view configuration of the MM system in accordance ^th tiie 

from a user/viewcr-spedficd spatial perspective, the com- ^ j^esent invention, previously seen in block diagram in FIG. 

puter produces from tiie ttiree-dimensional model a particu- j 

lar two-dimensional image of tiie scene tfiat is in best 4 b, pictorial reiH*esentation of a video display 

accordance witii some particular spatial perq)ectivc criterion particularly showing how, as a viewer hitcrface feature of 
tiiat has been received from tiie viewer. The particular Multiple Perspective interactive (MPI) video system m 

two-dimensional image of ttic scene that is displayed is a 35 ^^^f^^^e witii ttie present invention previously seen in 
real image of tiie scene as was obtained by any of tiie video diagram in FIG. 1 , a viewer can select one of ttie many 

cameras. The computer wiU automaticaUy select, and tiie .^^^ ^ ^ 5^,^^^ 

display will still display, over time, tiiose actual images jrf piG 5 is a diagrammatic rep-esentation showing how 

tfic scene as are imaged, over time, by different ones of tiie ^^^^ cameras provide focus on different objects in tfie 

multiple video camwas. Automated scene switdung, cspc- 40 ^ system in accOTdance witii tiic present invention; 

dally in relation to dynamic objects in ttie scene, is not ^j^Q^jng ttie viewer's current interest an appropriate 

known to tiie inventors to exist in tiie prior art cmiem must be selected. 

Third, ttie userMcwcr-spccified criterion may be <tf a 6 is anotfier pictorial representation of tiie video 

particular object in ttie scene. In tiiis case the computer wm • Multiple Perspective Interactive (MPI) video 

combine ttic images from ttie multiple video cameras not 45 accordance witii tiie present invention, tius tiie 

only so as to generate a tfiree-dhnenslonal vito inodd ot ^j^^ particularly showing a viewer-controlled 

die scene, but so as to generate a model m which objects m j|^^^.^ns[onal cursw serving to mark a point in tiirce- 
ttie scene are identified. The computer wiU suteequentiy (3.1^) ^^^i tiie projection of tiie 3-D 

produce, and tiie display will subsequentiy ^^^^ j*>^P^^ ^^^^ I 2-D cursor. 

ticular image appropriate to best show the selected object 50 ^ ^ ^ j^agram showing coordinate systems for 

Clearlytiusisafeedb^kloopr ttiek^on^^^^^ ca^ Lft^fnl^e Perspective interactive 

ttie scene serves to influence, m acocMdance witti a usa^ ^^y^oZtcm in accordance witii ti^ present inven- 

vicwcr selection of tiie object, how tiie scene is shown. (MFl) viaeo system in accwumc^^ 

riearlv the same video scene could l>e, if desired, shown tion. . « ^ . . , 

S« id ov^oS^ focusing Wcw on a diff«nt „ HG. 8 consisting of FIGS. 9a through M 

^ZiM^ in fee scene representatioii. and aocompanying dias-am, of three sepa- 
tS^v^'^e seSTScc, may dfl.er be stadc. and video dispUys in .he MulUple P«^ve ^.to^ve 

is a soccial tvi>e that nnambiguoosly specifies an object in In die scene. . „, . . _ . , 

L SyTas^tlon bel!^eenfl.robject posilioB and nO. 9 consisting of RGS. 9a and 96. ,s pirtorud rj^e- 

tiie cursw position in three dimensions, and is thus called "a seotation of two separate video dlspUys in the MiUt^le 

three-dimer^ional cursor". 65 Pcnpcctive intoactivc (MPD video system m accordance 

Fourth, the criterion specified by the user/view« maybe with the present invention If""^?";?^^ 

of a particular event in fte scene. In this case die computer image can be used for camera calibratmn; the frame of HG. 
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9a having sufficient points for calibiration but the frame of invention, the video display showing, as different compo- 

FIG. having insufficient paints for calibration. nents of the GM-FPS, views from the four cameras of FIG. 

FIG. 10 consisting of FIGS. lOka ttuoug^ l#c, is pictorial 17 in a top vow, and a panoramic view of the model showing 

representation of three separate video frames, arising from hypotheses coiresponding to the four moving objects in the 

three separate algorithn:>-seiected video cameras, in the 5 scene in a bottom portion; the GM-PPS serving to detect 

Multiple Perspective Interactive (MH) video system in each object in one or more views as is particularly shown by 

acoordanoe with the present Invention. the bounding bodies, and serving to update object hypotheses 

FIG. 11 is a schematic diagram showing a Global Multi- by a linc-of-sigjit projection of each observation. 

Perspective Pcrcgrtion System (GM^) p^^ the ^ ^^^^ pj^g 

Muluple ^J^cuyc Intcracfave ^ video sy^m m 10 pictorial views of the GM-PPS model showing various 

accordance with the present invention m use to take data {[ x*. j- * * ^ • ^ 

from calibrated oun^s covering a scene from different ^^^/^i^jr?^^? ^^^^ 

perspectives in oidcx to dynamic^y detect, localize, track " ^ 8^^^ 00:22:29:06; HGS. IVu-lW 

and model moving objects-deluding a robot vehicle and correspond to four actual camera views. 

human pedestrians — in tbc scene. FIG. 20 consisting of FIGS. 2^ throu^ 20(2, is four 

FIG. 12 is a top-level block diagram showing the high pictorial views of the same campus courtyard previously 

level architecture of the Global Multi-Perspective Percep- diagrammed in FIG. 14, and shown in FIG. 17, at global 

tion System (GM-FPS) portion, previously seen in FIG. 11, time 00:62:39:06; the scene still containing four moving 

of the Multiple Perspective Interactive (MFI) video system objects including a vehicle, two walkers and a bicyclist 

in acccHdance with the present invention, ttie architecture ^ fiq. 2I is another pictorial view of the video display to 

showing die interaction b^ween a priori information for- the GM-PPS poation of the MPI video system of the present 

malized in a static model and the information computed invention previously seen in FIG. 18, the video display now 

during system processing and used to formulate a dynamic showing a panoramic view of the model showing the 

model. hypotheses corresponding to the four moving objects in the 

FIG. 13 is a graphical illustration showing the intersection ^ scene at the global time 00:22*39:06 as was previously 

formed by the rectangular viewing frustum of each camera- shown in FIG. 20. 

$$cenc onto the environment volume in the GM-PPS portion 

of the MPI video systan of the present invention; the filled DESCRIFnON OF THE HIEFERRED 

frustum rq)resenting possible areas where the object can be EMBODIMENT 

located in the 3-D model while, by use of multmle views, the ^ _ . „ ^ , * ^ 

intersection of the ft^stiim from each camol wOl dosely « CapaMities of theMuttvle Parspcrtive toteactive 

approrimate the 3-D location and fonn of the object io the ^'f^' ^l^*"* ^^J^""' f^'^ ^oton 

^viionment model Potential In^hcations of These C^bilities 

FIG. 14 consisting of FIG. 14a and FIG. 146, is a diagram The capabilities of the Mukqile Perspective Interactive 

<rf a particular, exen^lary, environment of use of ttie (Nffl) video of the present invention are discussed even 

GM-PPS portion, and of the overall MPI video system of the prior to teaching the system that realizes these c^jabilities in 

present invention; the environment being an actual courtyard order that obtain potential implications of diese capabilities 

on die can^nis of the Univcisity of Califorcua, San Diego, may best be understood. Should these iir^Ucations be 

where four cameras, the locations and optical axes of which understood^ it may soon be recognized that the present 

are shown, monitor an environment ooosistuig of static ^ invention accords not merely a "fancy form" of video, but an 

object, a moving robot vehicle, and several moving persons. in-d^& change to the existing, fundam^tal, video and 

FIG. 15 is a pictorial representation of the distributed television viewing experience, 
architecture of the GM-PPS portion of the MFI video system The present specification presents a system, a method and 
of die present invention wherein, (i) a graphics and visual- a model for Multiple P^spcctive interactive — ^"MFT — 
izatioD workstation acts as the modeler, (U) several work- 45 video or teleyisioD. In the MPI video model multiple cam- 
stations on the netwcH'k act as slaves whidi process indi- eras are used to acquire an episode or a program of interest 
vidual frames based on the master's request so as to (iii) from several different spatial perspectives. The cameras are 
physically store the processed frames either locally, in a real, and exist in the real world: to use a source camera, or 
nearby storage server, or, in die real-time case, as digitized a source image, that is itself virtual constitutes a second- 
information on a local or nearby frame-grabber. ^ level extension of the invention, and is not presently con- 

FIG. 16 is a diagram showing the derivation of a camera tcmplated. 

coverage table for an area of interest, or environment, in MPI video is always interactive — die 'T in MPI— in the 

whidi objects will be detected, localized, tracked and mod- sense that die perspective from which the video scene is 

eled by the GM-PPS portion oi the MFI video system of die desired to be. and will be, shown and presented to a viewer 

present invention; each grid cell in the area is associated 53 is permissively chosen by such viewer, and pred^ermined 

with its image in each camera plane while, in addition, the However. MPI video is also intcracdve In that, quite 

diagram shows an object dynamically moving through the commonly, the per^vecdve on the scene is dynamic, and 

scene and the type of information the GM-FPS portion of the responsive to developments in the scene. This may be die 

MPI video system uses to maintain knowledge about this case r^ardless that the real video images of the scene from 

object's identity. 60 which the MPI video is formed are themselves dynamic and 

FIG. 17 consisting of FIGS. I7a through 17rf, is four may, for example, exhibit pan and zoom. Accordingly, a 

pictorial views of the campus courtyard previously dia- viewer-sclcctablc dynamic presentatioD of dynamic events 

grammed in FIG. 14 at global time 00*.22:29:06; the scene that are themselves dynamically imaged is contemplated by 

containing four moving objects including a vehicle, two the present invention. 

walkers and a bicyclist 6S Consider, for exaoQile, the presentation of MFI video for 

FIG. 18 is a pictorial view of a video display to the a game of American football. The *\iewcr-selectable 

GM-PPS portion of the MPI video system of the present dynamic presentation** might be, for example, a viewer- 
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selected imagbg of the quarterback. This Image is dynamic An early altcxnativc may be Nfl>I video on pay per view. 

^ aSordSat Ae qu^ should, byMs movement It has been hypothesized that the Internet, in paiUcul^ff, may 

during play cause that. In the simplest case, the images of expand in Ae future to as likely connect smart machines to 

several different video camera should be successively human users, and to each other, as it will to commumca- 
selected. The football game is, of course, a dynamic event 5 tivcty interconnect marc and more humans, only. Custom- 

wherein the quarterback moves. Finally, the real-world ized remote viewing can certainly be obtained by assigning 

source, camera, images that arc used to produce the MPI every one his or her own rcmotdy-controllable TV camera, 

video are themselves dynamic in accordance that fee camr robotic rover. However, tiiis scheme soon breaks down, 

cramcn at the football game attempt to follow play. can hundreds and thousands of individually-remotely- 
Theneteffectof all this dynamism is non-obvious, and of jq controlled camaas jockey for position and for viewcr- 

a different order than even such video, or television, expe- desired vantage points at a single event, such as the birth of 

rience as is commonly accorded a network video director of ^ whale, cff an auto race? It is likely a better idea to construct 

a major spwting event who is exposed to a multitude of ^ con^diensive video image database from quality images 

(live) video feeds. The experience of MPI video in accor- obtained from only a few strategically positioned cameras, 

dance with the present invention may usefully be conq>ared, ^ pcm^ universal construction of customized 

and contrasted, with virtual rcaUty.Tlie term **viitual reality" yicws from this database, all as is taught by the present 

commonly has connotations oS (i) unreality. (U) sensory invention. 

immersion, and/or (iii) self-directed interaction with a rc^ ^ ^ additionally be seen, toe MPI video of tiic present 

ity that is only fantasy, or *\irtual" The effect of fee MPI ^^^^^^^^^ yi^eo databases to be built in which 
video of the present invention differs frcnn vutual reauty in ^ ^t^bages are contained--dyaaniically and from moment to 

all tiiese factors, but is nonetheless quite shockmg. moment (frame to frame)— much useful information that is 

In tiw first place, the present invention is not restricted to intctwctive of the scene depicted. Qearly, in order to select, 

use witii video depicting reality— but reality is ttie che^st ^ ^ syntiiesize, an image of a particular-player, the MPI 

source of such infOTmation as can. when viewed through the system contains Information of the player's present 
MM video system of the present invention, still be quite ^ thereabouts, and image. It is thus a straightforward matter 

••intense". In other wwds. it may be unnecessary to be sy^cm to provide InfOTmaUon, in tfie form of text or 

attacked by a fake, virtual, tiger when one can visually otherwise, on the scene viewed, either continuously or upon 

experience tibe onrush of a real hostile football linebacker. j^^est 

In tiie second place, MPI video is presented upon a auxiliary information can augment the entertain- 

common monitor, or television set, and docs not induce mc 30 experience For cxai3^1e, a viewer might be alerted to 

viewer to believe that he or she has entered a fantasy reality. ^ changed association of a footbaU in motion from a member 

Finally, and in the third place, tiic self-directed mtcraction ^ ^ ^ member of the opposing team as is 

with MPI video is directed to observational per!5>ective, and recognized by the system to be a fumble recovery or 

not to a viewCT^s dynamic control of developments m fte i^tEiception. For exan^lc» a viewer might sinq>ly be kept 
scene in accordance with his or her action, » inaction. 35 j^onned as to which player presentty has possession of die 

What MPI video can do. and what causes it to be football 

"shocking" . is tiiat the viewer can view, or, in the Ai^rican ^ ^ wd>ablc use of such auxiUary inf onnation is 

vernacular, "get into^ the video scene just where, and even ^^^^^ ^"^^ longer be necessary to remain in 

when, tiie viewer chooses. Who at a Uvc sportu^ event has ignorance of what one is viewing if, by certain 

not looked at the cheerleaders, a favonte playo, or even flie 40 ^ ooimnands. 'Iielps" to understanding tiie scene, and 

referee? Psychological and sociological research has shown ^j^^^^^^e, may be obtained, 

that, among numerous other differences between us aU, men expcncm, , y 

and women, as one example, do not invariably visually 2. An Actual System Performing Multiple 

acquire the same elements of a picture or painting, let alone Perspective Interactive (MPI) Video in AccOTdance 

do the two sexes visually linger on such elements as they 45 ^yjtj^ the Present Invention, and Certain Limitations 

identify in common for equal time durations. (Women like of this Exemplary System 

«A ^ewwTbctast Alcardingly. MK video removes 53 n>e concept of MH video is taught m (be context of a 

ofT Lotions that pre«nfly make a video or a spordng event Tbe MPI video model aUows a vewer to be 

S?4ton viewer only a passive parUcipant in the video or active; be or she may request a prefe^ed camera position or 

SSn process (in the Am^can vernacular, a angle, or the viewer may even ask questions about contents 

wt^T <»««»i»^ ^ ^ nidimentary system auto- 

OfcLe.MH video need not be implemented for each « matically detamines the best camera and view to satisfy the 

and every individual video w television viewer in order to demands of fte viewer. , ^.^ „ 

^ useM. Perhaps with the advent of communicating 500 Videos of American footb^have been ^^^J'^ 

channels of television to the home, a broadcast major video source texts upon whid» ^''J^^'^.fJ^ 

AmericanfootballgamemightrcasonablycoBsumenotone, video wiU be taught and demonstrated. Footed! video 

but 25+ channels^ne for each player of both sides on (he 65 already in eristence w^ retrieved and operated "po" " » 

foottoll fteld. one for ead. coach. <»e for the football, and sample appUcation of MH video In order to demonstrate 

one for the stadium, etc. certain desirable features. 
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The paiticular, nidimcataiy, embodiment of an MPI video These indjvidual camera scenes arc then assimilated into a 

system features automatic camera selection and interaction model that represents the complete ^sode. This model is 

using three-dimensional cursers. The complete computa- called an "environment modeP. The environment model has 

tional techniques used in the rudimentary system are not a global view of the episode, and it also knows where each 
fully contained herein ttiis specification in detail because, by 5 individual camera is. The environment model is used in the 

an large, known techniques hereinafter referred to are in^e- MFl system to permit a viewer to view what he or she wants 

mented. Certain computational tedmiques are, however, from where he or she wants (within the scene, and within 

believed novel, and the mathematical basis of each of these limits). 

few tedmiques are fiiUy explained herein. Assume that a viewer is interested in one of the following. 

The rudimentary, demonstration, system of the present First, the viewer may be interested in a specific 

invention has been reduced to operative practice, and aE perspective, and noay want to view a scene* an episode, or an 

drawings or photographs of the present specification that entire video presentatioa from ttiis specific perspective. The 

^>pear to be of video screens are rqnesentations or photo- user may spediy a real camera specifically. Alternatively, 

graphs of ac^al screens, and are not mock-ups. Additionally, the viewer may only specify the desired general location of 

where continuity between successive video views is imj^ed, the camera, widiout actual knowledge of which if any 

then this continuity exists in reality aldiouglh, commensurate camera is in such location. 

with the amount of con^uter resource and computational Secwid, the viewer may be int«ested in a specific object 

power harnessed to do ie necessary transfonnations, the There may be several objects in a scene, an episode, or a 

successive and continuous views and presentations may not presentation. A viewer may want to always view a particular 
be in full real time. ^ object independent of its situation in the scene, episode, or 

The nmning MPI video system is presently being extend presentation. Alternatively, the object that is desired to be 

to other applications besides American football. In viewed may t>e context sensitive: the viewer may desire 

paiticttlar, a detail teaching of the concq>t, and method, <tf view &e bask^ball until the goal is scored to dien shift view 

generating a three-dimensional database required by the to ttie last player to couch the basketball. 
NfPI video system of the present invention is taught and ^ Thinl, the viewer may be interested in e specific event. A 

demonstrated in this specification not in the context of viewermay specify characteristics of an event and may want 

football, but rather, as a useful simplification, in the context (q view a scene, an qnsode, or a presentation from the best 

of a university courtyard diough which human and machine perspective for that event 

subjects (as opposed to footbaU players) roanL The present ^ The high level architecture fos: a MH video system so 

specification will accordingly be understood as being functioning is shown in a first level block diagram in FIG. 

directed to the enabling princqdes, constnartion, features ^ j^^^ ^ ^ certain perspective from each camera 10a, 

and resulting performance of rudimentary embodiment of an . . . Ifln is converted to its assodaled camera scene in 

MPI video system, as opposed to presenting great details on camera screen buffers CSB lla, lU, . . . lln. Multiple 

any or all of the several separate aspects of the systcnL camera scenes are then assimilated into the enviromnent 

-i A..^itP.r^«.* ftf thA vn>f ViH#»o <JuctPm model 13 by computer process in the Environ. Model 

3. Archiiecmre of the MPI Video System ^ ^ ^^^^^ ^ ^^^^^ 

A physical phenomena or an event can be usually viewed being part of the MPI video system of the present invention) 

from multiple perspectives. The ability to view fr<Mn mul- can select his perspective at the Viewer Interface 15, and that 
tiple perspectives is essential in many applications. Current ^ perspective is comnmnicated to the Environment Model via 

remote viewing via video or television permits viewing only a computer process in Query Generator 16, The programmed 

from one perspective, and that perspective being diat of an reasoning system in the Environment Model 13 decides 

author or editor and not of the viewer. A viewer has no what to send via Display Control 17 to the Display 18 of the 

choice. However, remote viewing via video or television viewer 14. 

even under such limitations has been very attractive and has Implementation of a universal, plug and play, MPI video 

influenced our modem society in many aspects. system that (i) track virtually anydiing, (ii) function in real 

Technology has now advanced to the state that each of time (i.e, for television), and/or (iii) produce virtually any 

many simultaneous remote viewers (i) can be provided witfi desired image, induding a full virtual image, severely 

a choice to so view remotely from whatever pcrq>cctive they stresses modem computer and video hardware technology 
want and, with limitations, (ii) can interactively select just 5q circa 1995. and can quickly come to consume the processing 

what in the remote scene they want to view. power of a mini-supercomputer. Economical dq>loyment of 

Let us assume that an episode is being recorded, cr being the MPI video system requires, circa 1995, advances in 

viewed in real time. This qnsode could be related, for several hardware technology areas. Notably, however, there 

example, to a scientific experiment, anScngincering is. as will imminently be demonstrated, no basic hardware 
analysis, a security post a sports event or a movie. In a 55 nor software fimction required by such a MPI video system 

simplest and most obvious case, the episode can be recorded that is not only presently realizable, but that is, in actual fact 

using multiple cameras strategically located at different already realized. Moreover, a relatively high level, user 

points. These cameras provide different perspectives of the friendly, viewer interface — whidi might have t>een consid- 

episode. Each camera view is individually very limited. The ered inq)ossible or extremely difficult of being successfully 
famous parable about an elephant and the blind men may be 60 achieved — ^^alls out** quite naturally, and to good effect 

recalled. With just one camera, only a narrow aspect of fiie from the preferred in^dementation c^, and the partitioning of 

q>isode may be viewed. Like a single blind man, a single functi<Hi within, the MPI systenL 

camera is unable to provide a global description of an a con^>lete MPI video system with limited features can 

episode. be, and has been, implemented using the existing technol- 
Using computer vision and related techniques in accor- 65 ogy. The exact preferred architecture of a MPI video system 

dance with the present invention, it is possible to take will dep^d on the area to which the system is intended to 

individual camera views and reconstruct an entire scene. be aj^ed, and the type and level erf viewer interaction 
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aUowcd, However, certain general issues arc in common to requirement peifanncd faster, and id real time. The require- 

1 ImpleAientations of MPI video systems. Seven ment might just barely be realizable in software if compu- 

critical areas that must be addressed in building any MPI tational paraUclism is exploited, but, depeoding uj^n sim- 

video system are as foUows. plifying assumpdons made, a conq)utCT ranging from an 

First a camera scene builder is required as a programmed 5 cngiacciing work station to a full-blown superconiputer 

computar process. In order to convert an image sequence of (both drca 1995) may be required. Luckily, low-cost (but 

a camera to a scene sequence, the MPI video system must powerful) miai^roccssors arc likely distributable to each of 

and docs, know where the camera is located, its orientation, the Camera Sequence Buffers CSB llfl, lib. . . . lln in order 

and its lens parameters. Using this infomatioa, the MPI to isolate, and to report, features and dynamic features 

video system is then able to locate objects of potoitial within each camera scene. Conclation of scene features at a 

interest, and the locations of these objects in the scene. This higher process level may tous be reduced to a tractable 

requires powerful image segroentadon methods. For struc- problem, Anotiicr excellent way of simplifying the 

turcd applications, the MPI video system niay use some problem— which way is used in the rudimentary embodi- 

knowlcdge of the domain, and may even change or label ^^^^ y-^^^^ system taught within this 

objects to make its sin faction task easier. This is, in fact, the ^pecificadon— is to demand that the scene, and each camera 

approach of the rudimentary embodiment of the MH video ^|iew thereof , include constant, and readily identifiable, 

system, as will be furdicr discussed later. markers as a sort of video "grid". An American football field 

Second, an environment model builder is required as a already has this gfid in the form of yard lines and hash 

programmed conqmter process. Individual camera scenes ^larks. So might a college courtyard widi benches and trees, 
are combined in the MPI video system to form a model of ^ a whale swimming free in an amcrphous tank while giving 

the environment All potential objects of interest and thci: ^ ^ spectrum, and presents an 

locations are recorded in the environment model The rep- exceedingly severe camera image selection (if not also 

rcscntation of the environment model dq>end on the faciU- cMielation) problem. 

ties provided to die viewer. If the images are segmented seventh a visaalizer is required in those applications that 
propcrly,thcn,byuseofpowcifulbutknown computes and ^ ^^^^^(^j-iaying of a synthetic image in order to satisfy 

computing methods, it is possiWe to build environment ^i^wa^s request. For cxan^lc, it is possible tiiat a user 

models in real time, or ahnost in real time. selects a perspective that is not available from any camera. 

Third, a viewer interface permits the viewer to select the ^ sohition is simply to select die closest canaera, and 

perspective that he or she wants. This information is to use its image. The solution of the nidimentary MPI video 

d)taincd from the user in a friendly but directed manner. ^ ^ present specillcation— which solution is far 

Adequate tools are provided to die user to point and to pick implementation at trite in the benefits 

objects of interest to select die desired perspective, and to obtainedr-is to select a t>est— and not necessarily a 

specify events of interest Recent advances in visual closest— camera and to use its image and sequence, 

interfaces, virtual reality, and related areas have contributed Qearly inmlementation of an MH video system with 

to making die MPI video system viewer interface vary unj^^^icted capability requires state-<rf-die art computer 

powerful— even in die rudimentary embodiment of the j^^^^j^ ^nd software, and will benefit by sudi improve- 

systenL ment in both as are confidcntiy expected. Some new issues. 

Fourth, a display controller software process is required to ^^^^ scvGa, are e3q>ected to arise in address- 
respond to the viewers' requests by selecting appropriate . different appUcations of MPI video. At die present time, 
images to be displayed to each such viewer. Ihese images 40 this specification, only a rudimentary MPI vide 
may aU come from one perspective, or die MH video system ^y^xcm is taught By implementing diis first MPI video 
may have to select die best camera at every point in time in system, die inventors have identified interesting future 
OTdcr to display the selected view and perspective. .^^^^ ^ computer vision, artificial intelligence. 
Accordingly, multq)le cameras may be used to display a f^^^Jaan interfaces, and databases. However, and for die 
sequence over time, but at any given time only a single best ^5 mojnent the following sections serve to discuss and teach an 
camera is used This has required solving a camera hand-off ^^^^ system Uian was implemented to demon- 
problem, strate the concept of die invention more concretely and 

Fiftti, a video database must be maintained. If die video conq)letely, as well as to define and identify performance 

event is not in real time (i.e., television) dien, then it is .^g^^ 

S^U^ue^^^^o^^^^^^ " 4.ARudime^,Ptot^ 

rTei is feature based'and permits content-based MH ^^^^^/^^J^^^^^ 

operations. See Ramesh Jain and Arun Hampapur; ^T^eta- Video of American Foott>all 

data for video-databases" appearing in SfGMOD Records, Key concepts in MPI video arc taught in this section 4, by 

December 1994. 55 reference no a rudimentary, prototype, embodiment of an 

In many appUcations of die MH video system, environ- MH video system dian was built particulaily for mult^)le 

ment models are also stored in die database to allow rapid perspective interactive viewing of American footbaU. The 

interactions widi die system- motivation of die Inventors in selecting diis domain was to 

Sixdi real-time processing of video must be inqjlemented find a domain diat was realistic, interestmg, non-tnviai and 

to permit viewing of real time video events. i.e. television, 60 sufficiendy well strucmrcd so as to demonsttate many 

In diis case a special system architecture is required to important concq)ts of MH video. It is also of note that 

interraetcachcamaascqucnccinrcaltimcandtoassimilate should the present MPI video system be appUed 

dieirVesults in real time so diat, based on a viewer input die commercially, it might already be possessed of such char- 

MPI video system can use die environment model to solve acteiistics as would seemingly make it of some practical use 

die camera selection j^oblem. 65 in certain ai^cations such as die "instant replay . 

A practitioner of tiie computer arts and sciences wiU Many otiier sports and many other applications were 

recognize dial diis sixtii requirement is nottiing but the fifdi considered by die inventus. American footijaU was diosen 
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due to the several attributes$of the game diat make it highly The viewer may request that the MPI video system should 

structured both from (i) database and (11) computer vision show all third down plays In which quarterback X threw the 

perspective. These issues of structure arc hereinafter dis- ttall to the receiver Y. 

cussed in the context of the implementation of the jo perform these functions, and others, the MFI video 

rudimentary, prototype, embodiment of the MPI video sys- 5 system needs to have infonnation about (i) contents of the 

football scene as well as (ii) video data. 

4 1 Scenario of Use, and Required RinctiMS, of an MPI s<Mnc of the above, and several similar questions, are 

'^f K^^'flf ^ ^^^^ ^ ^""^ . '^^^^^^ to MH television, while others arc relevant to MH 

'^^K^A^^"*^ ^"^^ video. Ihe major distinction between MPI TV and MPI 

mn North America on conventional television, the broad- ^♦u- t ^xtm -a 

casts of these football games have several limitations from ^ideo is in die role of the database. In case of MH video, it 

a viewer's perspective^ viewing of American football ^.T* preprocessing can tnmspire with the 

games coulds^mingly be significantly enhanced by adding f^^fj^f information stored in a database In case of 

Sie following facOitks, MPITV, most proccssmg must be, and will be, m real time. 

Usually a football game is captured by several cameras ^ ^ following section the rudimentary, prototype, MPI 

that arc placed at different locations on the field. Though system discussed is, remarkably, an MH TV system. A large 

those cameras cover various parts of the game, viewers can random access video database system that is usaUc as an 

get only one camera view at a time. Tliis view is not a result component of an MH video system is realizable by con- 

of viewers* choice, but is instead what an editor thinks most ventional means, but is expensive (circa 1995) in accordance 

people want to see. In most cases, editor's dedsions are with amount of video stored, and the rapidity of the retrieval 

right In any case, with the current technology ttiis expert 20 thereof. 

selection of views is seemingly the best that can be done. If In the rudimentary, prototype, MH TV system, as shown 

a viewer is interested in a certain player, or a shot from a in HO. 2, a football scene is captured by several cameras 

different angle, than he or she cannot see the desired image and analyzed by a scene analysis system. The information 

unless the editor's choice happens to be the same as the obtained fi-om individual cameras is used to form the envi- 

viewer's. By giving choices to a viewer, it is anticipated Aat 25 ronment modeL The environment model allows viewers to 

watching the game might be made significantly more inter- interactively view the scene. 

esting. Additionally, a prototype football video retrieval system 

Moreover, when watching football game questions often haw been iiiq>lemented, as hereafter e^qiLained, This system 

occur to viewers such as **who is this player who just now inoaaporates some of tiie above-listed functions sudi as 

tackled", or "how long did this player run in this play**. 30 automatic camera selection and pointing to players. Other 

Conventional video or television does not necessary provide functions are readOy suscq>Cibie of implementation using 

sudi information. Tools that provide such information would the same, existing, hardware and software technologies as 

seemingly be useful are already within the nidimcntary embodiment of the 

Still ficther, while watching a video of a food}all game, a system, 

coach or a player may want no analyze how a particular 35 4.1.1 Overview of the MH Football Video^elcvision 

player ran, or tackled, and to ignore all other players. An System 

interactive viewing system should allow the viewing of only The configuration of the MPI football videc/Celevision 

plays of intacst and these fitim different angles. Moreover, system is shown in FIG. 3. The current system consists a 

the video would desirably be good enough so that some UNIX workstation, a laser disc player, a video capture 

detailed analysis would be capable of being performed on 40 board, and a TV monitor and graphical display. The TV 

the video of the plays in order to study the precise patterns, monitor is connected to die laser disc player. The laser disc 

and performance, of die selected player. player is controlled by die UNIX workstatioa A graphical 

In the rudimentary MH video system, viewers may both user interface Is built using X-wiiKlow and Motif on gr^ihi- 

(i) select cameras according to their preference, and (ii) ask cal di^lay. 

questions about the name(s), or the movem£at(s). of players. 45 In use of the system, video of a football game was 

The following are some examples of interaction between a recorded on a laser disc. Hie actual video recorded was a 

viewers and the MPI video system part of the 1994 Super Bowl game. Since this vide footage 

The viewer may request that the MH video system should was obtained by commercial broadcast, the inventors did not 

show a shot of some upcoming {day or [days taken from have any control on camera location. Instead, the camera 

camera located behind die quarterback. 50 positions were reverse engineered using camera calibration 

The viewer may request diat the MPI video system should algorithms. See R. M. Haralick and L. G. Sh^iro; Computer 

show a best shot <rf a particular, viewer-identified, player. and Robot Vision, Addison- Wesley Publishing. 1993. 

The viewer may request that the MPI video system should Next, selected parts of the Super Bowl football game in 

show as text the name of the player to which the viewer whidi views firom duee different cameras were shown were 

points, with his or her cursor, on die screen of die display 18 33 selected. The three views were, of course, broadcast at three 

(shown in HG. 1). separate times. Hiey defect an important, and exciting, play 

The viewer may request that the MPI video system should in the 1994 Super Bowl game. This selection was necessary 

highlight on the screen a particular player whose name the to simulate the availability of separate video streams from 

viewer has selected from a player lisU multiple cameras. 

The viewer may request that the MH video system should 60 flus video data was divided into shots, eadi of which 

show him or her the exact present location of a selected corresponds to one football play. Each shot was analyzed 

player. and a three-dimensional scene description — to be discussed 

Thcviewermay request that the MH video system should in considerable detail in sections &-9 hereinafter — was 

show him or her the sequence when a selected player generated. Shots firom multiple cameras were combined into 

crossed, for cxan:q)le, the 40 yard line. 63 the environment model. The environment model contains 

The viewer may request that the MH video system should information about position of players and status of cameras, 

show him or her die event of a fumble. The environment model is used the system to allow MPI 
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video viewing to a user. User comnuinds arc treated as First. 2-0 infonnation is cxtraded. P^?^. 

SsTtfic system and arc handled by the environment frame, feature points sudi as playas ficU marks ere 

model and the database. extracted and a list of feature poinU is g^^^ed 

The interactive video interface of the system is shown in Second, 3-D Informattoii is extracted. From the two- 
nG4.ThcvidcoscreenofnG.4showsvideofraincstakcn 5 dimensional description of the video frainc, tlirec- 

from laser disc. Video control buttons control video play- dimensional information in die scene, such as piaya posi- 

back. Using a camera list, a viewer can choose any camera, tion and camera status, is then extracted. 

Using a player list, a viewer can choose certain players to be xhe details of these extractions are contained within the 

focused on. i a viewer doesn't select a camera, then the following sub-sections, 

system automaticaUy selects the best camera. Also, multiple ^ j Extracting Two-dimensional Infonnation 

viewers can interact using the three-dimensional cursor. extraction of two-dimensional information, feature 

These new features are described below. Some interface ^ extracted from each video frame. Feature points 

features for the interactive video are shown here. A user can ^^^^ separate items in the images. First, the jiayers 

select one of the many items to focus in the scene. d^cd by using their feet as feature points. Second, the 

4.2 Automatic Camo-a Selection u^^^ 15 field marks of the football fidd are used as feature points. As 
At any moment, there ^^^"^^^^^^ is known to fans of American football, American football 

game. Automatic ^.^^f*^!^ fieW has yard lines to indicate yardage between goal lines, 

the best camera accordmg to the pffcfcxence of a user "^rL^j^^ ^ ^^^^ui a set distance from the side 

bti^ri^Sorjfidd Field 

^t^s^ZX^Zr^^ 20 featurepointsbecausethdrexactposiuonasap^^^^ 

thc^<i Sii camera 3 the player is t<Jo small and their registration and detecbon can be used to dctcnmnc 

Different cameras provide focus on different objects. camera status. r.,^.^ 

Depending on the cunent interest, an appropriate camera In the rudimentary, prototype, MPI system, the fc^ 

must be selected. points are extracted by human-machme mteraction. This 
This function is performed by flie system In the following 25 process is currently carried out as follows. First, the system 

way. First, viewers select the player that they want to see. displays a video frame on the screen of Display 18 (shown 

Then the system looks into infcHmation on player position fjq, ly ^ viewer, or operator, 14 locates some feature 

and camera stams in the environment model to delermine ^ screen and inputs required information for each 

which camera provides the best shot of the player Finally feature point The system reads image coordinates of the 

the selected shot is routed to the screen. ^ feature points and generates a two-dimensional description. 

4.3 Interaction Using Three-Dimensional Curscrs process results in two-dimensiaial description of a 
In accordance with the present invention, a three- video frame that consists of a list describing the pUyers and 

dimensional cursor is introduced in suppMt of the interac- ^ describing the field marks. The player descr^tms 

tion between viewers and the NOT vide/TV system. A ^j.^^ each player's name and the coordinates of each 

torec-dimensional cursor is a cursor that moves in three- . . ^ht field mark descriptions include the 

dimensional space. It is used to indicate particuLir position ihrcc^ensional woiid), and the image 

in the scene. The MFI vidcoAV system uses this cursor to posiuons ™ j™^?^ 

bighlightpUyer..View.salso^^^ ^^Si^fu^eSi^ 

^iSe^ of inSon u'Sthree-dimensioaal cursers all feature points are specified interactivdy wi^e aid of 

are sh^ ilna 6^ shown ii FIG. 6, the cursor consists 40 human intelligence. Many features can be detea^d auto- 

of five lines. TTiree of the five lines indicate the x, y and z matically using machine vision techmques. See R M. Har^- 

axes of the three-dimensional space. The intersection of ick and L. G. Shapiro, op at. The process of automaticaUy 

these three lines shows cursor position. The ofeer two lines detecting features in arbitrary images is not trivial, «>wever. 

indicate a projection of the three lines onto the ground. The it is anticipated, however, that two trends; will help the 

projection helps viewers have a correct infonnation of 43 process of feature point identification in MPI video. Rrst, 

cursor position. new techniques have recently been developed, and will 

A viewer can manipulate the three-dimensi<Hiai cutsot so continue to be developed, that should be useful in 

as to mark a point in the three-dimensional space. The pennitdng the MPI video system to extract feature point 

projection of the three dimensional cursor is a regular cursOT information automatically. Future new techniques may 

centered at the projection of this marked point ^ include some bar-code like mcdianism for each playa. 

Both viewers and the MPI system use the three- fluorescent colwing on the players' helmets, or even some 

dimensional cursor to interact with each other In the first ^ ^^^^ devices that will automatically provide the 

exmxpic of FIG. 6, a viewer moves t|»e^sarto the posiuon .^^^ each pUyer to the system. It is also anticipated 

of a player and asks who this player is. The MPI system ti^n ^^J'^jj^ft^ for dynamic vision and related 

a>mpares the position of the cursor and the pre^^^ areas may suitably be adapted for the MH video appUcation. 

of each player to detcrmme which player the viewer is ^^Je the gc4l of ^rudimentary, prototype, system is 

vomtmg, , ^ 1 ^n^u^xm nnmarily to demonstrate MFI video, no extensive effort has 

In the second example of HG 6, V'^^^^l^^' to extract the feature points automatically. Fur- 
system a name of a player and asks where the player is. The ^ capabiUties, in this area is 
MPI system then shows the Picture of ^ ^^l^ f"^ ^ deei^d^raightfarward, and susceptible of implemen^^^ 
overlays the cursor on the position of the player so as to ^ praSff of digital video 
highlight the player. ^ 2 Extracting Three-<limensional Information 

5. Three-dimensional Scene Analysis xhc purpose of this step is to obtain three-dimensional 

The purpose of scene analysis is to extract three- information from the tw(vdimensional frames. 

dimensional information from video frames captured by 65 reUtionship between the thr«^<>^^^\°'^f;^^^^^ 
cameras. This process is performed in the following two video frames captured by the cameras 1 shown in FIG. T 

*^ Consider that a camera is observmg a pomt(x,y,z). A pomt 
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(u, V) io the image coordiiiate system to whidi the poiDt (\, status is estimated by intdpolatioo between key £rames by 
y, t) is mqipcd may be determined by the foUowing f^oceeding under the assumption that co^dinate values 
relationships, which relationships oon^se a coordinate change lineaily between a consecutive two key frames, 
system for camera calibration. 5.4 Camera Hand-Off 

A point (X, y, z) in the world cocadinate Systran is 5 The rudimentary, prototype, MPI video system Is able to 
transformed to a point (p» q, s) in the camera coordinate determine and select a single best camera to show a par- 
system by die following equation Ocular player or an event This is determined by the system 

using the environment model ^ectively^ for the given 
player* s location, the system uses reverse mapping for given 
10 camera locations, and then determines where the image of 
the player will be in the image for diflferent cameras. 
At the present time, the system selects the camera in 

where R is a transformation matrix from the world coordi. ^}^^ ^^5^ P^^^ ^^^/^^^^ 

nate system to the camera coordinate system, and (Xo, yo. z^) viewing area. TTie system could prospertively be made mc^ 

is the position of die camera. ^ ^ u precise by considering the OTientatton of the pUya also. The 

A point (p,q,s) in the camera coordinate system is pro- problem of transferring display control from one camera to 

jected to point (u,v) on the image pUne according to &c another is caUed the "camera hand-<^ problem". 




y-yo 



6. Results erf the Exercise of the Rudimentary MPI 
20 ^deo System 



following equation: 

L *' J >■ « J The rudimentary, prototype, MPI video system has been 

exercised on a very simple football scene imaged from three 
where f is camera parameter that determines ttie degree of different cameras. The goal of dus example is no demon- 
zoom in or zoom out strate the mediod and apparams of the invention, and die 

Thus, wc sec that an image coordinate (u,v) which cor- ^5 f^ij,mty obtaining practical results. The present imple- 

responds to world coordinate (x,y,z) is determined depend- mentation and embodiment can clearly be extended process 

ing on die (i) camera position, (ii) camera angle and (ii) ^^^^^ sequences, and also to different applications, and, 

camera parameto:. indeed, is already being so extended. 

Therefore, from two-dimensional inf<xmation that rs ^ ^ ^/ ^ ^1 ^ _i 

described at;>ve, we can obtain diree-dimensional camera 30 T^eat^^ 

and player infoi^tion in die following way. (See R. M. T^"^^ '^"^"^^ ^^'u "^^c"^ 

SLr^ck and L. G. Shapiro; Cor^uter ond RLt V^on. «>^^ f ^! """^t '^"^^^^^'^^ 

VrJirZ!^ D^.KiicWnr. 1 00^^ through 8c. These durce shots record the same footbaU play 

AcM^on-WesleyPubli^gam) buta« taken from different camera angles. Each shot lasted 

First, a camera cahhration is performed. If only one jfiffcrcnT^ 

known point is observed, a pair of unage coordinates and 35 sccunus. imk uuw umacui uiuo 

. \!l uToJ^KmVo Ai« i™«,n separate, but related, sequences. These sequences arc 

wcrld cooidmatcs may be known. By ^lymg dus known *~ [ j„ ri,. 

. ^ * J* *i_ used to build the model of events in the scene, 
pair to the above equations, two equations regarding die 

seven parametCTS tiiat determine camera stams may be Key frames were selected as previously explained, and 

obtained Observing at least four known points will suffice scene analysis was applied. In tiie process of scene analysis, 

to OTovide die minimum equations to solve die seven 40 at least three field marks for each key frame. This reference 

unknown paramrters. information was subsequendy used as known points in order 

However in die appUcation of die MPI video system to to solve die three unknown parameters diat determine cam- 
football, die (i) camera position is usually fixed, and (ii) die era status. Note that diis entire step could be avoided if a 
rotation angle is 2«o. This reduces die number of unknowns priori knowledge ci die camera status was available. It is 
to duee, which requires minimum (rf two known points. The 45 Ukely than in early, tclcvisioD network, applications of the 
field marks extracted in previous process are tiien used as MPI video system in coverage of structured events like 
known points. American foodwll diat Uie camera (i) positions and (ii) status 

Next an image to world coordinate mappingSis per- parameters will be known, and continuously known, to die 

formed." Once die camera statu^which is described by die MPI video system To sudi extent as diey are known diey 

seven parameters above— is known, the world cowlinate 50 obviously need not be calculated. 

may be determined from die image coordinate if it consid- In application of the scene analysis process to the actual 

ffed tiiat the point is constrained to lie in a {toe. In the video data it was found that not all video frames have 

application of the MH video system to football, the imaged enough known points. An example of a video frames that 

footbaU players are always approximately on the ground. lacks suf&cient known points is shown in FIG. 9£i.T1us may 

Accordingly, the positions of players can be determined 55 be contrasted widi a video frame having more than sufficient 

according to the above equations. known points as is shown in FIG. 9a. In the experimental 

5.3 Interpolation data used, 14 out of 15 key frames from camera 1 had at least 

Ideally die scene analysis process just described should be three (3) known points, while none of seven (7) key frames 

applied to every video frame in order to get the most precise from camera 2, and eight (8) out of fourteen ( 14) k^ frames 

information about (i) the location of players and (ii) the 60 from camera 3, had three (3) or more obvious known points, 

events in die scene. However, it would require significant The difference between the cameras was that camcxa 1 was 

human and con^utational effort to do so in the rudimentary, placed at high position while cameras 2 and 3 were placed 

prototype, MPI video system because feature points arc at low positions. Accordingly, estimates had to be made for 

located manually, and not by automation. Therefore, one key those video frames dut didn't show enough obvious known 

frame has been manually selected for evoy diirty frames, <5 points. The results of such estimations arc not necessarily 

and scene analysis has t>een applied to die selected key accurate. Many known points an this image can be used for 

frames. For frames in bctweea player position and camera camera calibration. 
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Some exanjples of actual results obtained by use of the able with the system, are shown and dis^^^^^^ 

niSt^ototype, MPI system are shown in HG. 10. of a P^^.^^^^Jf SAooSle 

These illuste^d lestlts wexc 6btained by selecting 'Wash- campus, to wit: a courtyard of the Engmeaing School at me 

Tliese lUustrateC resui^werc rrrTv^^ frame University of California, San Diego, This environment is 

ington- as a player to be focused on. For ead^^ ^losen in Ueu of-^ a possible alternative dioici^furthcx 
a thiee-dimensional cursor was overbid according to the 5 ^^^''sion ^ a football fidd and a football game because (i) 

position of •Washingjon". ^'i^^f^^^^'^V^^^^ t^^l^lsS^ol generaUy how (i) cameras maybe 

see that tiie resims of s<>ene -^yj;^^^J^^'^''^y placed for 'optimal' coverage, (U) accurate 

rate accordmg to the foUowmg observation. ^ knowSdgc fadlitates scentMaunera transformation, and (iii) 

First, (he positions of the player ^Washington that a obi^ct motion may be constrained to a known set of surfaces, 

human may read from the video frames aie dose to the ^^^^^^^y considering only (iii) object motion, the 

values that the system calculates JTie values ^^cjUated by ^^^^ eavkLnent contains (i)one object-Hi 

theMnvideosystemareshownbdoweachpictureinnG. walkcr-that foUows a proscribed and prcdctcr- 

1^- mined dynamic path, namely a walkway path. The exem- 

Second. each axis ofthrce-dimcnsionalcurscrsaf^>cars to environment contains (ii) still other objects— other 

agree with direction of me footbaU fidd that a human may ^^^^ walkers— that do not even know than they arc in any 

read from video frames. ^ scene, a system, or an experiment and who accordingly 

Third, the three-dimensional cursor ^^ar to be dose to j^^^^ f^^y piease in unpredetermined patterns (which arc 

ttie chosen player "Washington" in the screen video image. nonetheless carlhbound). Finally, the exen4)lary environ- 
Other frames were checked as well It has been confirmed jQ ment contains (iii) an object— a robot— that is not 

that the results of the MK video system to isolate, and to independent, but which rather moves in the scene in 

track, 'target" objects of interest are mosdy accurate, at least response to static and dynamic objects and events therein, 

for those frames that contain enough known points to judi as to, for example, traverse the scene wi&out running 

calibrate. into a static bench or a dynamic human. 

. « -5 T A \Mm 25 ft will therefore be recognized that even more is transpir- 
7. Global Multi-Pcr^e Perception In the MFI ^ j "Hie exemplary coityaid environment than on the 

Video System ,^cviously-discussed footbaU field, and that while this 

The iwesent section 7 and following sections 8-_ expound exen^lary courtyard environment is admittedly arbitrary, it 

the most concqjtuaUy and i^actically difficult portion of the • g Ycry rich in static and dynamic objects iirqwrtant to 

MPI video system: its capture, organization and processing exercise and demonstration ci an omnisdent multi- 

of real-world events in order that a system action— such as, perspective pcrcq>tion c^>ability of the MPI video system of 

for example, an immediate selection, or synthesis, of an ^i^^ present invention. 

important video image (e.g., a football fumble, an -j y Organization of the Teaching of Global Multi- 

intcro^on)— may be predicated on this detection. Until Perspective Perception In the MPI Video System 

this task is broken down into tractable parts in accordance Global Multi-Perspective Percq>tion is taught and exer- 

with the present invention, it may seem to require a solution ^ ^ campus environment containing a (i) mobile robot, 

in the areas of machine vision and/or artificial intelligence, stationary obstacles, and (iii) people and vchides mov- 

and to be of such awesome difficulty so as to likely be jng about-^ctors in die scene diat arc shown diagrammali- 

intractoble,andii£^ossibleof solution with present tcchnol- ^ pjG^ jj^. in the present apj^oach an onmisdent 

ogy. In fact, it is possible to make such significant progress ^ multi-perspective perception system uses multiple stationary 

on this task by use of modem technology ^Hcd in accor- cameras which provide comprehensive coverage of an 

dance with the present invention so as ncrt only to g^ extended environment Hic use of fixed global cameras 

recognizable resulu, but so as to gei results that arc by some simj^es visual progressing. 

measure useful, and arguably even cost effective. dynamic objects in the environment, induding the 
In accordance with the present invention of Multiple 45 robot, can be casUy and accurately detected by (i) integrating 
Perspective interactive (MPI) video, an omnisdent multi- motion information from the different cameras covering 
perspective perception system based on nudtiple stationary these objects, and, inqwrtantly to the invention, (u) con- 
video cameras permits con]4)rchensivc live recognition, and straining the environment by analyzing only such motion as 
coverage, of objects and events in extended environment constrained to be to a small set of known surfaces. 
The system of die invention maintains a realistic rcprcsen- 50 The particular global multi-perspective perception system 
tation of the real-world events. A static model is built first that monitors the campus: environment containing people, 
using detaUed a priori information. Subsequent dynamic vehides and the robot uses the several color and mono- 
modeling involves the detection and tracking of pe<^le and chrome CCD cameras also diagrammatically represented in 
objects in at least portions of the scene that arc perceived (by no. 11. This particular perception system is not only usefiil 
the system, and in real time) to be ttie most pertinent 35 in the MPI video system, but is also useftil in any completdy 
The perception system, using camera hand^off, dynami- autonomous system with or without a human in the loop, 
callv tracks objects in the scene as they move from one sudi as in die monitoring of planes on anpwt runways, 
camera coverage zone to another. This tracking is possible The operation of the global multi-perspec^ve percqXion 
due to several imp«tant aspects of the approadi of the system is discussed in both human-controU«i and autono- 
present invention, induding (i) strategic placement of cam- 60 mous modes. In the preferred systemjndividual video 
ff as for optimal covmge, (ii) accurate knowledge of scene- streams are (i) processed on separate work stabons on the 
camera transformation, and (iu) the constraining of object local network and (U) integrated on a special pinpose 
motion to a known set of surfaces. grq)hic5 madunc on tiic same network. The P"^?"^^ 
Inthisandthefollowingsections8^9ofthisspedfication, system, the particular experimental se^^^ and pertmcnt 
(i) a description of particularly the novd pattern and event 65 perfarmaoce issues, arc <lcsmbcd JoUows: 
rccognitio?^ capabimy of the MPI video system of the The next section 8 descnbes the preferred approach and 
present invention, and (ii) certain results presentiy obtain- the prindple behind camera coverage, intcgraUon and cam- 
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era hand-off. The prototype global multi-perspective per- objects, states of objects in (he scene (e.g, a particular human 

oeptioD system, and the results of experiments thereon, is is motale, or the robot vehicle immobile), etc. 

next described in section 9. The approach of present inven- 8.1. Three-dimensional Modeling 

tion is^ to the best present knowledge of inventorSf a revo- The three-dimensional model of ttxt prefeued embodi- 

lutionary applicatioo of computer vision that is immediately 3 mcnt of the prototype multi-per^ctive perception system 

practicdly useable in several diverse fields such as inteUi- in accordance with the present invention is created using 

gent v^cles as well as the interactive video ^^Ucations — infonnaticm from multiple video streams. This model pro- 

sudi as situation monitoring and tour guides, etc. — that are vides information that cannot be derived &om a single 

the princy^al subject of the present specification. camera view due to occlusion, size of the objects, etc 

The applicability of the prototype global multi- lo Reference S. Chattcrjee, ^ aL op. cit 

perspective perception system to just some of these appU- A good three dimensional model is required to recognize 

cations is presented in section 10. Opportunities for further conq}lex static and moving obstacles. At a basic level, tiie 

improvements and expansions are discussed in Section 11. multi-perspective percq)tion system must maintain infoima- 

o 1 • « n ^® positions of all the significant static obstacles 

8. Mulu-Perspective Percq)tion ^5 ^^d dynamic objects in the environment. Id addition, the 

Multi-perspective perception involves each of the follow- system must extract information from both the two- 

ing. dimensional static model as well as the three-dimensional 

First, the "expectations" that various objects will be dynamic modcL As such, a representation must be chosen 

observed must be generated from multiple diffo-ent camera W facilitates maintenance of object positional informa- 

views by use of each of (i) a priori infonnation. (ii) an 20 tion as well as (ii) supporting more sophisticated questions 

environmcot model and (iii) the information requirements object behavior. 

of the present task. The statement of the immediately While information representation can be considered an 

preceding sentence must be read carrfuMy because Ae in^lementation issue, tiieparticularprescntation chosen will 

sentence contains a great deal of information, and impOTtant significantiy affect tiie system development. Thus, informa- 

charactoization of one aspect of the present invention. Eadi ^ rqarcsentation is consid«ed to be an important element 

of (i) a priori inf wmation, (ii) an environment model, and preferred multi-perspective perception system, and of 

(iii) the information requirements of the task, have variously architecture. In the preferred system, geometric inf<Mma- 

becn considered, and melded into, prior art systems for, and ^ represented as a combination of voxel representation, 

methods of, machine pa*c^on. Note however, that Oic first gridm^ representation and object-location representation, 

sentence <rf this paragraph is definitive. Next, note tfiat the ^ Specific iII^)lementations and domains deal with tiiis differ- 

use of the (i) information, (ii) environment model, and (Ui) *°^y* 

information requirements is to generate — q)ecifically from ^^cu combined with information about the exact position 

multiple different camera views-something called "e:q>ec- orientation of a camera, die a pricMi knowledge of tiie 

tations**. These "expectations" are the probabilities ti^at a (i) static environment is very rich source of Information which 

particular object will be observed (H) at a particular place. ^ previously received much attention- For each single 

Second, objects from eadi camera must be independenUy ^^^^ ^^^^ ^"^"^ |f ^ 

detected and localized. TOs is not always donecStfie priw ^imen^^^^ P<«i^<>° <>' ^ dynarmc object detected by its 

art, although it is not unduly complex, Simpie motion f^f ^^v^. f ^"^""^f^J^^ (i) a^on 

detectionismostlyusedintiieprefeiTwlembo^ntof the i'^^'nnation about die scene and (n) flie camera 

present, prototype! global muuTperspective perception sys- ^ Pfnuneters are coupled wito (m) the assumption tiiat aU 

i-ifre»- r r r r ^ dyuamic objects move on the ground surface. 

. . , . . . Using this infonnation it is a straightforward exercise for 

Next the separate observations are assimilated into a apractitioncrof the computer programming arts to compute 

three dimensional model. In tius step, tiie^efcrred cmbod^ of tiie line tiiat passes through the camera 

men of the present inventon leaves tanuliar ground projection point and a given feature on its image phinc. 

quickly, and plunges mto a new construct for any perceiv ^ assuming that the lowest Imagepoint of a dynamic 

tion system, whetha global and/or multi-perspective or not is on ti>e ground, tiie approximate position of die 

Fourth, and finally, the model is used in perfuming the (^jg^t ^hc ground plane os readUy found. Positional 

required tasks. Exactiy what this means must be pos^ned information obtained from all views is assimilated and 

until tfie '^moder is better understood 50 stored in tiie 2D grid refmssenting tiie viewing area. 

A hig^-level schematic diagram of the different compo- For the case where an object is observed by more than one 
nents of the preferred embodiment of the prototype multi- camera, the three-dimensional voxel representation is par- 
perspective perception system in accordance with the ticularly efiScadous. Here a dynamic object recorded on an 
present invention is shown in FIO. 12. A study of the image plane projects into some set of voxels. Multiple views 
diagram will show that the system includes both two- 55 of an <^ject will produce multiple projections, one for each 
dimensional and three-dimensional processing. Reference S. camera. The intersection of all such projections provides an 
Chatterjee. R. Jain. A. Katkere, P. Kelly, D. Y. Kuranuira, estimate of the 3-dimcosioaal form of the dynamic object as 
and S. Moczzi; Modeling and interactivity in MPI-\^dco, illustrated in FIG. 13 for an object seen by four cameras. 
Technical Report VCL-94~103, Visual Computing This section and its accon^>anying illustrations — short as 
Laboratory. University of Califomia, San Diego, December 00 they may be — have set forth a complete disclosure of how 
1994, to make two- and three-dimensional models of the scene. It 

Two key aspects of the architecture diagrammed in FIG. no remains only to use such models, in conjunction with 

12 arc the (i) static model and the (ii) dynamic model. The other information, for useful purposes, 

static model contains a priori information such as camera Z,2 AutcHnatic Camera Handoff 

calibration parameters, look-up tables and obstacle informa- 65 Camera handoff should be understood to be the event in 

tion. The dynamic model contains task specific information which a dynamic object passes from one camera coverage 

like two dimensional and three dimensional maps, dynamic zone to another. The multi-perspective perception system 
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must maintain a consistent rcFesentatton of an object's grabber. Hence. Ifae test was not only realistic, but migration 

identity and bdiavicr during camaa handoff. This requires of the percqrtiMi system into (i) real-tmic usmg (ii) real 

tlie maintenance of information about the object's position, video frame capture boards proved easy, 

its motion etc Calibration 

CamoaHandoff is a crucial a^ of processing in toe 5 Calibration of the cameras in the perception system is 

multi-pffspective percepUon system because it integrates a inportant because aocwate canwra-world iransfonnation is 

varietTof key system components. FixsUy. It reUes on vital to conect system function, The cameras are assumed to 

accurate camoa oOibration information, static model data. becalibiatedapriori.sothatpiedseinfornuaion about each 

Secondly, it requires knowledge of objects and their motion camera's position and onentatwn could be used eimer 
through the environment determined from the dynamic lO directly, or by use of pie-computed camera oovoage t^^^^^ 

model RnaUy, the camera handoff can influence dynamic to convert two dimensional observations into fliree dimen- 

object detecdon processing. sional model space, and, further, three dimensional expeo 

This section 8 has described the architecture, and some tations into 2D. . ,^ _^ - 
important features, of the multi-perspective percqrtion sys- For the experimental exercise of the perception system^ 
tem/Reference also S. Chaltajee. et aL op. cit. The next 15 complete, geometric three dimensional model of the court- 
section desmbes in detaD the preferred implementation of yard was built usmg map data. This informatioii was then 
the muW-pexspcctive perception system for the appBcation used for external cahTirationrf each camera. CaUbration was 
rf i^nito^g^^gV^ard. done with a user in the loop. Tlie static model was va«i 

* " bom a location near die actual camera location and the user 

9. Setup of the Multi^JCTspective Perception interactively modified the camera parameters until die visu- 

System, and Results of System Use alized view exactly matched the actual camera view 

The implementation of an integrated Multqile Perspective (displayed underneatfi). 

interactive (MFD video system demands a robust and 9.1.4 Distributed ArcWlcdurc 

capable implementation of ttie multi-perspective perception At the University of Cahforma, San Diego, cameras are 

subsysteT To simplify the teaching of the multi- 25 physicaUy distributed throughout flie campus to provide 

pewiicctiveJperception subsystem, and since this subsystem secmity coverage. Because the expetUMntal use of the 

Sendone isuseful in several otiicr applications (described perception system requires synchronized frames froin these 

in Section 4) than just MH video, the foUowing describes camaas at a very fast rate, frame capture was done close to 

the miiM-perspective perception subsystem as a stand-alone the cam«a on separate computers. modularity and 

system indepe^ntoruirMH video system of which it is 30 real-time video processmg it is very unportantflijrt the v^ 

a part. It will be understood that, one the object be independentty processed cl(»e to the sources thaeof. The 

id<»tificatioos, object tracking, and multiple perspective prefared hardware setup fci the «pMimental excrase is 

muM-perspcctive percepdon subsystem are pictorially diagrammed in FIG. «• Seyeid independent 

SeJiUsastraigK^dnX-rusetheseresultsin hetenigeneous computers-a Sun SPARCstaUon models 10 

rSl v^tea systeL. (For many purposes of supplying W and 20 and/or SGI modds indigo2, Indy and Chalkng^ 

information to L video viewer, only a high-level viewer were sdectivdyusrf based on critaumduding (im^^ 

ite^ is required to access the considerable current on the CPU, and the computer Uiroughput. (u) compute 

information of tiie multi-p«spective perception subsystem.) proximity to the camera and availability of a frame apmrc 

^SfTuowing sections ^SbTthe'Sperspective per- board (for real-time setup) and (m) the Pro«mity of ea^ 

ceptfon sZjTstem/system in detail. 40 computer to a storage locaUon. measured m Mbps (for the 

9.1 Multi-Perspective Percqition System Prototype experimental setup). 

9 1 ISenioandJUse TTje wo* stations in tiie experiment were connected on a 

lie initi^ development and exerdse of the multi- 120 Mbps ethemet switch which guaranteed fuU-speed 

perspective pereeption system took place in a laboratoiy on point-to-point connecti™.. A central ff'P}>\^^^J^^^ 

kTa^dedTJLd color sequence. A one minute long 43 was used to control tiie four video processing worfcaat^ 

scene was digitized from four color CCD cameras overkwk- to maintain tiie enyironment model (and associated temporal 

ine a typical^ampus scene 1. The one minute scene covers database), and, optionally, to communicate results to another 

two pedestrians, hvo cyclists, and a tob<^ vehicle moving coin)uter process sudi as that exerasing and performing an 

between coverage zones. A schematic of this scene shown in MH video function. 

FIG. 14, consisting of FIG. i4a and HG. 14b so The central master computer and die remote slave oom- 

For calibration and experimental evaluation of the proto- puters communicate at a high symbolic level; minimal 

type system, one of the two pedestrians walked on a pro- image information is cxdianged. Hence only a very low 

d^ned known path. No restrictions were placed on network bandwidtti is requaed for master-sUve oommuni- 

other moving objects in ttie scene. cation. TTie master-stave information exchange protocol is 

9.1.2 Diaitalization S3 prrfeirably as follows: 

The foiff views of the scene were digitized using a First, tt»e master con?)uter imhalizes graphics, the data- 

frame^ddressable VCR. frame capture board comHnation. base and flie environment model, and waits on a pre- 

The synchronization was done by hand using synthetic specified port 

syndironization points in the scene (known as hat drops). Second, and based on tiie master computer's knowledge 
The resulting image sequences were placed on separate 60 of the networic machine throughput etc.. a sqwrale corn- 
disks and coiitroUers for independent distributed access. putcr process starts die slave computer processes on selected 
HavinR an extended prc-dieitized sequence (i) accorded remote machines. 

repeat^ty and (ii) permitted devdopmem of die percep- TOrd, eadi stove counter contacts ttie master convuter. 

tionsystcm wi&out the distractions and time consumption using a pre-specifled madiine-port combmation. and an 

of repeated digitalization of the scene. The source of flie 65 initialization hand-shaking protocol ensues, 

scene image^ucnce was transparent to the perception Fourth, die master con^tcr admowledges each slave 
system and was in fact, hidden bdiind a virtual frame con5>uter and sends die slave con^uter initializataon infor- 
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nution such as (i) wbm the images are actuaHySstored (for 
the labcratay case), (ii) the staiting frame and frame 
interval, and (iii) camera-spedfic image-processLng infor- 
matioD Ukc thresholds, ma^ etc. 

Fifth, the slave initializes itself based on the Infonnation 
sent by the master computer 

Sixth, once the initialization is counted, the master 
conq)uter, either synchronously or asyndironously depend- 
ing on application, will processes the individual cameras as 
described in following steps seven through nine. 

Seventh, whenever a frotne from a specific camera needs 
to be processed then the master computer sends a request to 
that particular slave computer with infoosatioii about pro- 
cessing the frame focus of attention windows » £rame$spe- 
dfic thresholds and other parameters, current and expected 
locations and identifications of moving object etc.^ continu- 
ing during this processing any user interaction. In syndiro- 
nous mode^ requests to all slave ccnqniters are sent simul- 
taneously and the integration is done after all slave 
con^>uters have responded* In asynchronous mode^ this will 
not necessarily proceed in unison. 

Eighth, when a reply is received, the frame information is 
used to update the environment model and the database as 
described in following Secdon 9.1.7. 

The next sections describe the communication traffic 
between the master and the slave computers. 

9.1.5 Modeling and Visualization 

A communication master computer that manages all slave 
con4>uters, assimilates the processed information into an 
environment model, process user input (if any), and sends 
information to the MPI video process (if any), resides at die 
heart of the multi-perspective perception system. In the 
preferred prototype system, this master computer is an SGI 
Indigo2 work station with high-end graphics hardware. This 
machine, along with graphics software — OpcnGL and 
Inventor — was used to develop a functional Environment 
Model building and visualization system. Reference J. 
Nddev, T. Davis, and M. Woo; OpenGL™ Programming 
Guide: Official Guide to Learning OpenGL, Release 1, 
Addison- Wesley Publishing Coo^mnyf 1993. Reference also 
J. Wcmecke; The inventor Mentor: Programing Object' 
Oriented 3D Graphics witfc Open Inventor"^; Release 2, 
Addison- Wesley Publishing Con^iany, 1994. 

In the preferred system, inventor manages the scene 
database and OpenCE^ performs the actual rendering. A 
**snap^ot'* view of the visualization system of the master 
computer, induding four camera views, and a rendered 
model showing all the moving objects in iconic fcnms, is 
shown in FIG. 18. 

9.1.6 Video Processing 

One of the goals of the exercise of die multi-perspective 
perception system was to illustrate the advantages of using 
static cameras for scene capture, and the relative sin4>licity 
of visual processing in this scenario when compared to 
processing from a single camera. While more s<^phisticated 
detection, recognition and tracking algorithmis ore still being 
developed and applied, the initial, prototype multi- 
perspective peroepti<»i system uses simple yet robust motion 
detectioa and tracking. 

In the prototype system, and as described in previous 
sections, the processing of individual video streams is done 
using ind^ndent video processing slaves, possibly running 
on several different machines. The syncfaronizatiott and 
coordination of these slaves, any required resolution of 
inconsistencies, and genemtion of expectations is done at the 
master. 

Ind^ndent processing of information streams is an 
important feature of the informatioD assimilation architec- 



ture of the present invention, and is a continuation and 
outgrowth of the work of soaus of the inventors and their 
colleagues. See, for example, R. Jain; Environment models 
and irtformation assimilation^ Technical R^ort RJ 6866 

5 (65692), IBM Almaden Research Center, San Jose, Calif., 
1989; Y. Roth and R. Jain; Knowledge caching for sensor- 
based systems. Artificial Intelligence, 71:257-280, Decem- 
ber 1994; and A. Katkere and R Jain; A framework for 
irtformation assimUation^ to be put^hed in E]q>laratory 

10 Vision edited by M. Landy, ct al., 1994. 

The independent processing results in pluggable and 
dynamically reconfigurable processing tracks. The 
preferred, prototypical, communication slave computers 
perform the following steps on each individual video frame. 

IS Video processing Is limited fay focus of attention rectangles 
specified by the master canqniter, and pre-computed static 
mask images delineating portions of a camera view whid) 
cannot possibly have any interesting motion. The computa- 
tion of the former is doae using current locations of die 

20 object hypotheses in each view and projected locations in 
the next view. The latter is currently created by hand, 
painting out areas of eadi view not oo die navigable surface 
(waUs, for exan^le). Camera coverage tables help the 
master computer in these computations. Coverage tables, 

25 and the conc^ of objects, are bodi illustrated in FIG. 16. 
In operation, die input frame is first smoothed to remove 
some noise. Then the difference image d^i^is computed as 
follows. Only pixels that are in the focus of attention 
windows and that are not masked are considered. 

c»^,='niresboltl(Ab8{F^,-F,), thiwhoid_vftlue) 

Optionally, to remove motion shadows^ following operation 
is done: 

This shadow-removing step is not invariably used nor 
required since it needs a one frame look-ahead. In many 
cases sirq>le heuristics may be used to eliminate motion 
shadows at a symbolic level. 

40 Next, components on binary difference image are com- 
puted based on a four-ndghborhood criterion. Corr^nents 
that are too small or too big are thrown away because they 
usually constitute noise. Frames tliat contain a large number 
of components are also discarded. Both centroid (from first 

4S moments), and orientation and elongation (from the second 
moments), are extracted for each oonq>onent 

Next, several optional filters are applied at the slave site 
to the list of conq>onents obtained from the previous step. 
Conunonly used filters include (i) merging of ovoi^ing 

50 bounding boxes, (ii) hard limits of orientation and 
elongation, and (iii) distance from expected features etc. 
Finally, the resulting list is sent back to the master site. 
9.1,7 Assimilation and Updating Object Hypotheses 
The central visualization and modeling site receives pro- 

55 cessed visual inftHnoation from the video processing sites 
and createsAipdates object hypotheses. There are several 
so^sticated ways of so doing. CXurently, and for the sake 
of simplicity in developing a completely operative 
prototype, this is done as follows: 

60 First, he list of two-dimensional (2-D) object bounding 
boxes is finther filtered based on global knowledge. 

Second, the footprint of each bounding box is projected to 
the primary surface of motion by intersecting a ray drawn 
from the optic center of that particular camera through the 

65 foot of the bounding box with the ground surface. 

Third, each valid footprint is tested for membership with 
existing objects and the observation is added as support to 
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tt.edo.e^objec^tf «.y,If no jjectls close encagh. ^^^^Jg ^,,"^^^,^'0^ 

piiteVdghting based on distance ftomlhe camera, dtec cnt linages <*tain«sd with can«as of diffaeatAmc^^ 

tioD of motion etc) to update the position of each object. 3 (huge vaiiadons in color color vs. nonochrome)-even 

Fifth, the object positions are projected into the next when the objert is jast a few paels wide. 
frMTbased on . doSain-dependent tracker. n»e »x>ttom section of the operator di^lay saeen m m 

STevents in the s^are to be n^ognized, object M showstee hyP<^«^ 7*^*^ m-/^^ -Z 

Dodtions and associations are compared against predeter- several frames (first frame is global clock 00:22:10K)). The 
a^n3at«X eS^le. if totte coward loene the intensity ^ object's marker represents «f 

XVl^rLved into ^ coincidence with one of the '° cad. hypoth«es. llie cntoe display 
J^Seta^immovaWeobJects, such »s a bench, then the depicted, and the object hypothesis dugramm^ 
Shot W^tavemn into mTbench-M abnonnal and depicted, is, as might well be expected^ in fliU color. FIOS. 
SteirToo^cS^. Fo? «ample. If In the scene of a 17-2t are therefore monochrome renditions of color images. 
SlaS frSSbdrhas m^ in a short time interval In particular, the '=^i<^''^^^^'^^y'^^^/^^ 

L^ spZ^SidS with a moving player .hat was » yellow, and ^ intensity of the b-f" ^^"^^^^ 
credetomined to be of a first team to spatial coincidence object's marksr represents flie coi^dence m the hypotheses 

^^!^SSy tf the foofcanU detected to have rev««ed slight diflfeicnces in color intensity as correspond to dilf er- 

its direction of movcmenl on the field— then any of a (i) ences in confidence. .„k„,,..„ . hi<,h 

Wck^i.Tlunibk,or(iil)intercepttonm.yhaTetranspired. 20 Tlie multi-paspertive perception subsystem has a Ugh 

S ic iSeSeveM is of inter^the >dewer in the MH confidence in each o^ect for whid. f "^'^^ 

video svstem. then appeopdate control signals arc sent Also. FIG. 18 because, at the particular global tune resented, 

^d onT'sSS^wledgp of^tic objecU, if an ea* object happens to have been observ«l from many 

actual or projected position of a dynamic object intersects a cameias over several past frames nn-22-2Q 0fi 

static obi^ then anmpropriatc message may be sent If the 25 Tbe thiee^lmiensional model at global dmc 00.22.29.06 

SerfSoSSll g^KTS-baU is dete^ined to be in is shown in HGS. 19a-19e 'Mr^^i vutual views, 

S c^inddacc^ the forty yard marker, then it is HGS. Uta-IM show the model from the four «:al camenj 

^ tSL £f^ is oi ft^ Lty yard line. views. One.tx.one conespondence bc^f« 

^T7b«;.i« the camera views can be deariy seen. The flllh view of nc. 

fiih SnGS. 17 through 21 frames in an cxemptaiy 30 1»* is a virwd view of the model from directly overhead the 

exoS^nSLg rf^e thousand (1000) total fotmes courtyard-^ere no real camera actuaUy «^ts. Tins vu- 

fr^1^4r<Maert cameras acquired as described In tual view shows the exact locations of all three objerts. 

tZo^g iVS^ S^sp^ve perception sulv induding the robotic vehicle, in the two^nsional plane 

^ecaon oi uk rau^ i— r- of fliecourtyard.Thice objccts are vciy accurately locahzed. 

'^F^ 17 through 19 show the state of the subsystem at 35 IHe fourth object, Walter Number TVo (#2) in FIG. 17 and 
rf^h»(tin« (M^M^9-06 HGS 20 and 21 show the state of 18, has some aror in localiiation since this person is (i) not 
^ff.:^^^f^Z^^l^^^^9■.0e. in no. 17. visMe in Camera number foar^^^^^^^ 
four dynamic objects ic shown in the scene: a robot vdude, Is ve«y smallln Cameras numbers two and three (#2 & #3), 
twopedestiiansandabicydlstThesceneiscoveicdbyfour hence leading to some ewors. 

o? *e rSSi'hS^wn Clock, as is shown Observation is not us^since its t^-^^^jlf -^^^ 

undTte camera's view in one of HGS. 17 through 17d. bottom oftheimage^vi«isly when L^^S 

rvnrrannmbcr tiiree f#3> whidi is arbitrarily known as box intersects the botbm of the image, its fiiU extent wnnot 
S^s^'-t^^sidtoitiitheglobaldocksince 45 be determined and should be ignored. To show die devd- 

tL <^r^ die largest covemge «id the best image opment of object ^^^^^^J'J^^f^:^,^l 

quality. FIG. 17a-17d dearly shows the coverage of ead. experiment is taten (10) seconds ^^^^l^J^tTn 
„ , ^ ^ corresponds to FIG. 18 while HO. 

"^hown In no. 17. an object tfiat Is out of view, too 21 comsponds to FIG. 19. One important observation to 

smiS. iX^Sed from view in one camera is in view. 5o «-keinnGS.20and21i. Aat 

large and/or un^uded to the view of another camaa. of Walker number one (#1) and Bicychst numb^ one (#1). 

N«c the obiec* labds used in the FIG. 17 are for both are still dasslfled as separate objects. T^s is only 

S^L ly Sp?oS?ype subsystem does not indude possible due to the subsystem's history and tracking media- 

any non-trivial object recognition, and aU object identifiers nisuL 

that persist over time are automatically assigned by the 55 9.2 AppUcahons ^. . . ^ m^un 

^rte^nemonicnamesUke "Walker r, or "Walker" refer In addition to mnltl-perspective (fj^^"^'" 

to Uie same object identification (e.g., what the software a v«iety of other apphcahon ^ "T^^'f 

program would label "BasicBnvObject0023", gld«l multi-perspective perception subsystem dcscnbci 

^3adXobj^7-, etc.) over aU the different frames of For instance, enviromnents demandmg sophisUcated visual 

Firs 17 21 60 monitoring, such as airp«t runways and hazardous or com- 

A pictoiial' representation of the display streen showing pl« roadway traffic dtuations c^ advantageously use toe 

the w^or intoface to the muW-p«spective perception global multi-perspective poceptton subsystem. In these 

^ STin FIG. 18. F^r ™\dews are environments, as in MPI v«teo, objeds must be recogmze^ 

shown in the top row of FIG. 18. Eadi view is labeled using and identified, and spatial-temporal inforiMUon about 

its mnemonic Mentification mstead of its numeric idendfi- 65 objects' locations «k1 h«h«^<^.'^^I»r'^„?' » 

otioo because humans respond better to mnemonic "id's-. The expected first appUcaUon of Ae gto^ mul(> 

Eacb view may be assodated with a one of PIGS. 17«-17i perspective perception subsystem to the MFI video system 
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has been io spoits* and it is expected that sports and o&cr 
entertaiamcnt applications — which greatly benefit — will be 
the first conim£9'dal ^plication of the subsysteni/systena. 
Sports events* e.g. football games, are already oommonty 
imaged with video cameras from several different spatial 
perspectives — as many as several dozen such for a major 
professional football game. Tbe reason that still mctt cam- 
eras are not used is primarily perceived as having to do to the 
expense of such human cameramen as are required to focus 
the camera image on the *'action", and not llie cost of the 
camera. Additionally, it is unsure bow many different 
^teeds** a spoits editor can use and select amongst — 
especially in real time. The reason the televised sporting 
event viewing public is by an lai^ge satisfied with the 
coverage offered is that they have never seen anything 
better— including in the movies. Few pc<^e have been 
privileged to edit a movie or a video, and even fewer to their 
own personal taste (no matter how weird, or deviant). The 
machine-based MFI video of the present invention will, of 
course, accord viewing diversity without the substantial 
expense oi human labor. 

Consider that in using the global multi-perspecdve per- 
ccptioD subsystem and tiie MFI video system, multq>le video 
perspectives are integrated into a single comprehensive 
model of the action. Such a representation can initially assist 
a number of video editors in choosing between different 
perspectives, for example a video edited for the "defense^ 
and one for the ^^offense** and one for the ^'offensive 
receivers", etc., as well as the standard **whole game** video 
editor. Ultimately, and with increasingly affordable com- 
puter power, even a regular viewer who is interested, for 
example, in a particular player would be able to customize 
his video display based on that player. Interactive Video 
{plications such as these will greatly benefit from, and will 
use, both the global multi-pcxspective percq>tion subsystem 
and the MFI video system. 

Still another application where the global multi- 
perspective percq)tion subsystem may be used directly is as 
a tour guide in a museum or any such confined space. Rather 
than moving objects in the scene (ie, the courtyard, cr the 
football field), the scene can remained fixed (i.e., the 
museum) and the camera can move. Hie response accorded 
a museum visitor/video camera user will be even more 
powerful than, for example, the hypertext linkage on the 
World Wide Web of the Internet On an interactive computer 
screen and system (whether on the Internet or not) a viewei/ 
usa and point and click his/her way to additional informa- 
tion. However, the viewcr^se^ is viewing a video represen- 
tation of museum ait, and not the real diing. 

Consider now a visit to a museum of art using, instead of 
a self-guided tour headset, a hand-held video camera. The 
userA'icwer can go anywhere that he or she wants within the 
galleries of the museum, and can point at any art work, to 
perhaps show not only the scene at hand in the vicwfinder of 
his or her video camera, but perhaps also a video and/or 
audio overlay that has interactively t>een sent to the user's 
video camera firom "computer central** The "computer cen- 
tral" recognizes where in the imiseum the user's video — 
whidi Is also transmitted out to the "con4)uter centra)" — 
arises from. Simple "iielps" in the gallery rooms, such as bar 
codes, may perhaps help the "conq>uter central" to t>etter 
recognize where an individual user is, and in what direction 
the user is pointing. So far this scheme may not seem much 
different, and potentially more complex and expensive, than 
sin^y having a user-initiated inf<smation playback system 
at each painting (although problems of tinoe synchronizatioD 
for nmltiple simultaneous viewers may be encountered wiA 
such a system). 
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The advantage tiiiat tile global multi-perspecdve percep- 
tion subsystem offers in the art museum environment is that 
accumulation of a **uscr track", instead of an ^object track", 
becomes trivial. The user may be guided in a generally 

3 non-repetitious track through the galleries. If he/she stops 
and lingers for a one artist, or a one subject matter, or a style, 
or a period, etc., then selected further works of the artist, 
subject matter, style, period, etc., that seem to command the 
usa-*s interest may be highlighted to the user. If the user 
dwells at length at a single work, or at ap<^on thereof, then 
the central coiiq>uter can pexfa^s send textual 00* audio 
information so regarding. If die user fidgets, or moves on, 
then the provided information is obviously of no interest to 
the user, and may be terminated. If the user listens and views 
through all offered messages that are classified *1ustorical 
perspective of the persons and things depicted in the art 
work viewed", then it might reasonably be assumed that the 
user is interested in history. If, on the contrary, the user 
listens and views through all offered messages that are 
classified *life of ttie artist", then it might reasonably be 

20 assumed diat the user is interested in biograpiiy. 

9.3 Conclusions, and Future Developments, Concerning the 
Global Multi-perspective Perception Sut>system 

The complex phenomena of ''man-machine information 
systems of the future" discussed in the immediately pn>- 

25 ceeding section may seem aU **fine and good", or even 
fascinating, but some minutes deliberation are likely 
required to understand exactly what this all has to do with 
the present invention. In the simplest possible terms, 
infonnatioD — and a great, great deal of such infcmuation, 

30 indeed— comes to a camera, which is the best present 
madune substitute fat himian vision, in the form of two- 
dimensional images. However, our own human vision is 
stereoscopic, and our eye/brains combination, perceptive of 
not two, but three, dimensions. We reason diings out spa- 

35 tially in three dimensions, and we are interested in what goes 
on in three dimensions — as at a real live football game — as 
well as in two dimensions — as in the presentation of a 
football game on television. (We are also interested in 
smelling^ tasting and/or hearing concurrently with our 

40 viewing, but the present invention cannot do anything about 
satisfying this desire.) 

It is the teaching of the present Invention, broadly 
speaking, that in order to best serve man, machine system 
that convey visual information ought to, if at all possible or 

45 practical, *Yise to the level" of tfaree-dim^sional informa- 
tion. The machine system would desirably so rise not in the 
images that it displays to viewers (which displayed images 
will, alas, remain two-dimensional for the foreseeable 
future) but instead, in the construction and managenoent of 

so a datat>ase from which information can be drawn. Moreover, 
if this three*dinkcnsional database is good enough, and if die 
machine (computer) processes that operate upon it are clever 
enough, &en the power, and the flexibility, or viewer 
service, and presentations, are magnified. This magnification 

55 is in the same sense that we get more out of life by operating 
as autonomous agents in the three-dimensional worid than 
we would if wc could view all the cinema of the world for 
free forever in a darkened room. If a human cannot interact 
with his/her environment — even as viewed, when necessary, 

60 through a two-dimensional window — then some of the 
essence of living is surely lost 

It is the teaching of the pcesent invention bow to so 
construct from multiple two-dimensional video images a 
three-dimensional database, and bow to so manage the 

65 three-dimensional database for the production of two- 
dimensional video images diat not necessarily those images 
from which the database was constmcted. 
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Future improvements to thVglobal multi-perspcctive per- (e.g. players and footbaU) to ^'l??^"' ^^^^"^f;^^^^^ 

cepdoTsrystem wfll involve building oVthe complete mactoe vision pro^ams-^ nL^^vi^ 

frLwori^pr^oWdedinthisspecific^^^^^^ ?S^.^oL^T^^^^^^ 

two dimeasional motion detection and trackmg, tteee f^^^P^^J^ 

dimensional integratioo and tracking, rtc. are Posdbl^ 5 ^^^^/^^gj, ^ach video scene (V.^ 

Another important extension of the present invention would ^^tacked" objects (the playCTs) arc only viewed later, 

be to use cooperative active cameras for enhanced track ^ .^^^ ^ ^ ^^^^ tape or 

robots and other moving objects over wide areas. This (^roM. Accordingly, it is rcspectfuUy suggested that Ae 

^oach could both (i) reduce the number oi cameras ^^^^ ^ present invention is not 

required to cover an area, and (U> improve object detection lO ^^^^ ^y certain practical limitations, as of i^esenU on 

and recognition by keeping objects towards center of view. particular image extraction function performed in the 

Future improvements to the global multi-pcrspcctivc per- „dirocntary embodiment of the invention, 

ception subsystem may also be taken in the area of co^ ^ particular, rudimentary, embodiment of the 

erative human-machine systems. Interactivity at the central ^^^^^^^^ taught in this spedflcation the synthesized video 

site might be improved so as to permit a human to peifo^ 15 ^ ^ conMJlrtdy of a virtual cameraTimage that may 

higher-level cognitive tasks than swapiy asking Jwhoc , or anywhere, but is instead of a machine-detennined 

♦^hatAvhor. or '-whenr The hurnan might ask, „,ost appropriate real-world camcxa. This may initially seem 

cxaim>le, *Vhyr. In ttie context of footbaU, and for fee ^siMificant, and substantive, curtailment of the 

event of a tadde, the machine (the con^utcr) mi^t be^lc ^j^bg^"^ of the present invention. However, impor- 

to advance as a possible answer (which woiUd not invari^^^ 20 ^ Actors should be recognized. First, the 

l>e conect) to the question *Vhy (the tackle)? something ^^^ifon of multiple images, even video images, to 

like: "Defensive linebacker #24 at the (site of) teckle has ^^^^ ^ •toorphlng-, and is, ciica 

not been Inqjeded in his motion smce the start of the play. . ^ ^^^^ rudimentary 

The machine has sensed that linebacker #24-^ho may or ^ of the present invention does proceed to perforra fliis 

may not have actuaUy made the tacldc but who was appax- 25 ^^^^ ^ ^^^^ pcifOTmed on fee 

cntly neaiby-was not in contact with any defensive player .^^^ worksUtion on whidi the rudimentary embodi- 

prior to the tackle. In a highest-level intciprctahon of flus invention has been fiilly opcrationaUy 

event as would be, and as of the present can be, rentowl ^y^j^xaL Another single reason that the rudimentary 

only by a human being, the li^ly intcrprrtation ^ syrtMn of the present invention does proceed to pcrfcarm this 

sequence-^s was recognized by the machine— is that 30 ^^^^ ^ example of American 

someone has missed a tackle. football initially dealt with by the system and method of the 

10 Ttie Particular, Rudimentary Embodiment of the present invention, it is uncertain whether ttiis expensive, and 

Invention Taught Within This Specification conq)utationally extensive, stqj (which turns out to be a final 

described man^tioD and synthesis is of recorded vide« ^f^^ sdectio^^morphing function performed in the 

The rudimentary nature of the parocukr «nbodmcnt ^^J'^^'^^Z^ ^ ^^tiple real video cameras. 
U^ughtwilhinlhisspcdlicationfurAcrdKUt^Jor^^^ 50 '^^J^ '^,^ ^ compu^tionally totensive. the 

that the extraction of some ''«=»'^f«^«f^>"^„^/^^ ^^^^ is uTcfully powerfZ and is. in the preferred 

images Is not only not in real tune, out .s m fact done „ en Jo^g woAstotiou. 

manual means, just once *bcforc the game . Moreovor, they usefuUy be very powerful, and can usefully 

are casilv captured by even the most rudimentary madune the prcsrat can useiuiiy oe very P^wwi«". / 

motion— are mudi harder to extract especially at high controL ^ u ^ i » ,^ 

and most espedaUy in real time. To extract these As explained, the present »"veiitio° has art been^ t^^^^ 

^ing features en^ the Kalm of machine vidoa None- present date of filing, ^plementtd a Us " Wo^levd 
thelesTthat this portion of the system of the present inven- 6S of intaacbvc vutual television. It need not be ^J^^ 

tion k challenrug. «°any siiple machine solutions- it may be understood as a cohaent logicaL and useful 

langing from fluorescently bar-coded objects in the scene scheme of so implemcnttng vutual video/television. 
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10.1 DJiections of Rjture Devclc^nneiit 

This spedficatioD has descrtbed the development aad 
actual use of a prototype footbail video retrieval system. 
This system serves to demonstrate the concepts and the 
potential of MH video. The feasibility of the broader 5 
concepts is comjidetely demonstrated. Design and imple- 
mentation of MFI video for longer sequences of football 
and also for other {plications, is still proceeding as of die 
tiling date. 

However, as is also clear from the present specilicatioD, lo 
the MPI video system is in its infancy. The potential of die 
MFI video techniques is obvious, but cost effective iniple- 
mentation. Almost all medium- to large-scale coix^HJter 
technology involved in the implementation of the prototype 
MFI video systems was stretched to its limits. The following is 
are only a few examples of the useful, and probable, future 
developments and enhancements. 

10.1.1 Scene Analysis 

In the prototype MH video system, much infcnnation 
was inserted manually by an operator. However to make 20 
MFI video practical for oommeicial use, this process should 
be automated as much as possible. (Notice that it is not 
necessary that MFI video should invariably be so aut<Hnated 
in order to be used. Certain very crucial or interesting events 
for which multiple video images eust — such as key plays in 25 
q>orting events — may be well deserving of careful analysis 
aftw die fact) 

Also, and as may be recalled, it was found to be difiScult 
to detennine camera status for some video frames which 
contain very few known points to calibrate. This problem 30 
may be solved by using infonnation otrfained from odier 
video frames, bodi of other cameras in the same instant 
and/or of die same camera in the instants before and after. 
Once this technology becomes practicaL, it will be possible 
to structure many other items and objects to sinq>lify the 35 
object recognition task. 

10.1.2 Data Modeling and Indexing 

Infonnation structure that is contained in a scene is 
usually complicated, and die amount of infotmadon in the 
scene is huge. Modroover, this video informadon is develcped 40 
and received over but a short period of time. To deal widi 
various types of queries, good data modeling is required. See 
Amamath Gupta, Terry Weymouth, and Ramesh Jain; 
**Semantic queries widi pictures: die VIMSYS model** 
appearing in Proceedings of /he llth International Corner' 45 
mce on Very Large Data BaseSy September 1991. 

To enable the best quick response to the queries, indexing 
techniques will be required. These techniques for images 
and video are just being developed. 

10.1.4 The Human Interface so 

The present specification has taught that interaction using 
three-dimensional cursor is a good way for a user/viewer to 
point or highlight objects in three-dimensional space. 
However, in die field of entertainment and training* where 
interactive video is expected to be usefuL an even more 55 
friendly interface is desired. Techniques to specify camera 
location, describe events of interest, and other similar things 
need further development. In many applications, like 
'telepresence**, one may require extensive use of virtual 
reali^ environments. In applications like digital lifararies, 60 
strong entasis on user modeling will be essential. 

None&eless to the potential of inq^oving, and rendering 
more abstract, the user/viewer interface in some 
qsplications, this interface is most assuredly not a *Veak 
point" of die present invention of MPI video. Indeed, it is 65 
difficult to even imagine bow new and inqvoved userA^iewer 
interface tools may be used in the context of interactive 
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movies and similar otha* applications of MFI video. It seems 
as if the tools that the user/viewer might reasonably require 
arc already available right now. 
10.1.4 Video Databases 

As access to data from nuire and more cameras is 
pennitted, the storage requirements for MFI video will 
increase significantly. Where and how to store this video 
data, and how to organize it for timely retrieval, is likely to 
be a major issue for expansion and extension a[ the MFI 
video system. la the prototype system^ the single most 
critical problem has been the storage of data. Future MFI 
video will continue to put tremendous demands on the 
capacity and efficiency of organization of the storage and 
database systems. 

10.2 Rec^itulation of the Invention 

In one, rudimentary^ embodiment of present invention, a 
virtual video camera, and a virtual video image, of a scene 
were synthesized in a conqMiter and in a cocqwtcr system 
from multiple real video images of the scene that were 
obtained t>y multiple real video cameras. 

This syi^sis of a virtual video image was ccmputation- 
aUy intensive. Depending upon how extensively and how 
fast (i) three-dimensional analysis of the multiple scenes is 
to transpire, (ii) infonnation from die multiple scenes is to 
be extracted, and (ill) linkage between the multiple scenes is 
to be established, the computer and conqHiter system real- 
izing the present can usefully be very powerfiiL, and can 
usefuUy exercise certain exotic software functions in the 
areas of machine vision, scene and feature analysis, and 
interactive controL In the prototype system network- 
connected engineering woric stations that were relatively 
new as of the 1995 date of filing were used. 

Notably, however, the present invention need not be (and 
to the present date of filing has not been) implemented at its 
**full blown" level of interactive virtual television in order 
that it may be recognized that a coherent, logical, and useful 
scheme of bnplementing virtual video/television is shown 
taught 

The virmal video camera, and virtual image, produced by 
the MFI video system need not and commonly does not 
have any real-woild counterpart. The virtual video camera 
and virtual image may show, for exanq>le, a view of a 
sporting event for example American football, from an 
aerial, or an on-field, perspective at which no real camera 
exists or can exist. 

In advanced, computationally intensive, from the virtual 
camera/virtual image can be computer synthesized in real 
time, producing virtual television. 

The synthesis of virtual video images/virtual television 
pictures may be linked to any of (i) a perspcctiYc, (ii) an 
object in the video/television scene, or (iii) an event in the 
videoAclevision scene. The linkage may be to a static, or a 
dynamic, (i) perspective, (ii) object or (iii) event For 
example, the virtual videoAelevision camera could be 
located (i) statically at the Une of sczinmiage, (ii) dynami- 
cally behind the haliback wheresoever he might go, or (iii) 
dynamically on the football wheresoever it might go, in a 
video/television presentation of a game of American foot- 
ball. 

The virtual camera, and viitual image, that is synthesized 
from nuiltq>le real world video images may be so synthe- 
sized interactively, and on demand. For example, and in 
eariy deployments of the system of the invention, a televi- 
sion sports director might select a virtual video ccplsy of a 
play in a football game keyed on a perspective, player ot 
event or might even so key a selected perspective of an 
upcoming play to be synthesized in real time, and shown as 
virtual television. 
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Ultimately, many separate viewers are able to select, as 
stcrts fans, tbcir dedrcd virtual images. For example, a 
virtual vi<leo replay, or even a virtual television, image of 
each of the eleven players on each of two American football 
teams, plus the image of the football, is canied on twenty- 5 
three television channels. The "fan" can thus follow his 
favorite player. 

Ultimate interactive control where each **faD" can be his 
own sports director is possible, but demands that consider- 
able image data (actually, three-dimensional image data) be lo 
delivered to the *tan** cither non-real time in batdi (e.g., on 
CD-ROM), or in real time (e.g., by fiber opdcs), and, also, 
that the 'tan** should have a powerful conq)Uter (e.g., an 
engineering workstation, drca 1995). 

In accordance with the preceding explanation, variations 15 
and adaptations of Multiple Perspective interactive (MFD 
video in accordance witti the present invention will suggest 
themselves to a practitioner of the digital imaging arts. For 
exanq)le, monitors of the positions of the eyes mi^t "feed 
back" into the view presented by the MPI video system in a 20 
manner more akio to •'flying'* in a virtual reality landscape 
than watdiing a foott>all game — even as a live spectator. It 
may be possible for a viewer to "swoop" onto the playing 
field, to "circle" the stodium, and even, having crossed over 
to the **olher side" of tiie stadium, to pause for a look at that 25 
side's cheerleaders. 

In accordance with these and oflia possible variations and 
ad^)tations of the present invention, the scope of the inven- 
tion should be determined in accordance with the following 
claims, only, and not solely in accordance wito that embodi- 30 
meat within which the invention has been taught 

What is claimed is: 

1, A method of presenting to a viewer a particular two- 
dimensional video inaagc of a real-world three dimensional 
scene containing an object conqirising: ^ 

imaging in multiple video cameras each at a dLffcrent 
spatial location multiple two-dimensional images of a 
real-woild scene each at a different spatial perspective 
not all of whidi scene perspectives may always and 
invariably show the object in the scene; ^ 

combining in a con^witcr the multiple two-dimensional 
images of the scene into a three-dimensional model of 
the scene so as to generate a three-dimensional model 
of the scene in which model the object in the scene is 
identified; 

selecting in the conQxitcr from the toce-dimcnsional 
model a particular two-dimensional image of the scene, 
corresponding to one of the images of the real-world 
scene foat is imaged by one of the multiple video ^ 
cameras, showing the object; and 

displaying in a video display the particular two- 
dimensional image of the real-world scene showing the 
object to the viewer. 

2. The method according to claim 1 55 
wherein the combining is so as to generate a three- 
dimensional model of the scene in which model objects 
in the scene are identified; 

wherein the receiving is of the viewa-specificd criterion 
of a selected object that the viewer wishes to particu- 50 
larly view wittiin the scene; and 

wherein die selecting in the computer from the three- 
dimensional model is of a particular two-dimensional 
image of the selected object in die scene; and 

wherein the displaying in the video display is of the 65 
particular two-dimensional image of the scene showing 
the viewer-selected object. 



3. The method accwding to claim 2 wherein the viewer- 
selected object in the scene is static, and unmoving, in the 
scene. 

4. The method accordii^ to claim 2 wherein the viewer- 
selected object in the scene is dynamic, and moving, in the 
scene. 

5. The method according to claim 2 wherein the viewer 
selects the object tiiat he or she wishes to particularly view 
in the scene by act of positioning a cursor on the video 
display, which cursor unambiguously specifies an object in 
the scene by an association between (he object position and 
the cursOT position in three dimensions and is thus a three- 
dimensional cursor. 

6. The metiiod according to daim I performed in real time 
as television presented to a viewer intexactively in accor- 
dance widi the viewer-specified criterion. 

7. The method according to claim 1 aK)licd to a real- 
world three dimensional scene containing a plurality of 
objects 

wherein the ima ging in multiple video cameras each at a 
difftf ent spatial location is of multiple two-dinaenslonal 
images of the rcal-worid scene containing the plurality 
of objects each at a different spatial perspective; 
wherein the combining in a con^uter of the multiple 
two-dimensional images of the scene into a three- 
dimensional model of ttic scene is so as to generate a 
three-dimensional model of die scene in which model 
the plurality objects in the scene are identified; 
and wherein, before the selecting, the method further 
con^jriscs: 

receiving in the computei from a prospective viewer of 
die scene a viewar-spcdfied criterion of a particular 
one of the plurality of scene objects relative to which 
particular one object the viewer wishes to view die 
scene; 

wherein die selecting in the computer from die three- 
dimensional model is of a particular two- 
dimensional image of die scene, concsponding to 
one of the images of tiie real-world scene mat is 
imaged by one of die multiple video cameras, show- 
ing the viewer-selected object; and 
wherein the displaying in a video display the particular 
two-dimensional image of die real-world scene 
showing tiie viewer-selected object to tiic viewer. 
8. A mctiiod of presenting to a viewer a particular two- 
dimensional video image of a real-wortd tiirec dimensional 
scene ccmtaining an object, the method oonqrising: 
imaging in multiple video cameras each at a different 
spatial locatiwi multiple two-dimensional images of die 
real-world scene containing die object eadi at a differ- 
ent spatial perspective; 
combining in a computer die multiple two-dimensional 
images of the scene into a ttiree-dimensional model of 
the scene containing the scene object; 
receiving in die computer from a prospective viewer of 
the scene a viewer-specified particular spatial 
perspective, relative to which particular spatial per- 
spective the viewa: wishes to view die object In the 
scene; 

selecting in die computer from die diree-dimensional 
model a particular two-dimensional image of die scene 
corresponding to one of die images of die real-world 
scene tiiat is imaged by one of die multiple video 
cameras in accordance with die particular spatial per- 
spective received from die viewer, diis selected image 
being an actual image of die scene, out of all the actual 
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images of the scene as wo'e imaged by ail the multtpie 
video cameras, that is most dosdy shows the object in 
accordance with the particular spatial perspective cri- 
terion received from the viewer, and 
displaying in a video display the particular two- 
dimensional image of die real-world scene showing the 
object at the desired spatial perspective to the viewer. 

9. The method according to claim 8 

wherein the selecting is. over time, of plural actual images 
of Ae scene as are imaged, over time, by plural ones of 
the multiple video cameras; 

wherein the computer docs not invariatdy select firom the 
three-dimensioDal model an image that arises from one 
only of die multiple video camoas, but instead selects 
plural images as arise ow time from plural ones of the 
mult^7lc video cameras. 

10. The method of presenting to a vicwa: a particular 
two<iLmeasiottal video image of a real-world three dimen- 
sional scene acccHding to claim 8 ^lied to a scene con- 
taining a moving object 

wherein die imaging in muldfdc video cameras each at a 
different spatial location multiple two-dimensional 
images is of the real-world scene containing the mov- 
ing object each at a different spatial perspective; 

wherein the combining In a computer of the multqile 
two-dimensional images of the scene is into a tfaree- 
dimensional model of the scene containing the moving 
object; 

wherein the receiving in the coiiq)uter from the prospec- 
tive viewer of the scene is <rf a viewer-specified par- 
ticular spatial perspective relative to which particular 
spatial perspective the viewer wishes to view the mov- 
ing object in the scene; 

«1ierein the selectiiig in the computer from the diree- 
dimensional model is of a particular two-dimensional 
image of the scene cotrespcmding to one of the images 
of the real-world scene that is imaged by one of the 
multiple video cameras in accordaiice with the particu- 
lar spatial perspective received from the viewer, this 
selected image being an actual image of the scene, out 
of all die actual images of die scene as were imaged by 
all the multiple video cameras, that is most closely 
shows the moving object in accordance widi the par- 
ticular spatial perspective criterion received from the 
vicwcar; and 

wherein die displaying in a video display the particular 
two-dimensioaal image of the real-world scene show- 
ing die moving object at die desired spatial perspective 
to die viewer. 

11. A method of presenting a particular two-dimensional 
video image of a real-world three dimensional scene to a 
viewer comprising: 

imaging in multiple video cameras each at a different 
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a particular two-dimensional image of the scene, cor- 
responding to one of the images oi the real-wcrid scene 
that is imaged by one of die multiple video cameras, 
showing die viewer-sdected event; and 
displaying in a video display the particular two- 
dimensional image of the real-world scene showing the 
viewer-selected event to die viewer. 
12. The method according to claim 11 wherein the viewer 
selects the event that be or she wishes to particularly view 
in the scene by act of positioning a cursor on die video 
display, which cutsks- unambiguously specifies an event in 
the scene by an association between the event position and 
the cursor position in duee dimensions and is dius a three- 
dimensional cursoc 

15, A method of selecting a video image showing a one 
object from niult^>le real video images obtained by a mul- 
t^lidty of real video cameras showing a scene containing 
multiple objects, the method conqxising: 

storing in a video image database the real two- 
dimensional video images of die scene containing 
muh^le objects as the video images arise from each of 
a multiplicity of real video cameras; 
creating in a computer from the multiplicity of stored 
two-dimensional video images a three-dimensional 
video database containing a three-dimensional video 
image of the scene; 
selecting in the computer a real two-dimensional video 
image of the scene showing die one object from the 
three-dimensional video database; and 
displaying die selected real two-dimensional video image. 

14. The method according to claim 13 wherein ttc gen- 
erating conl^ai5es: 

syndiesizing from the dvee-dimensional video database a 
two-dimensional virtual video image oi die scene that 
is widiout conespondenoe to any real two-dimensional 
video image of a scene. 

15. The method according to dairo 13 further comprising: 
receiving in the computer a criterion of a spatial 

perspective, whidi spatial p^spective is not that of any 
of the mult^lidty of real video cameras, on the scene 
as is imaged within the ttiree-dimensional video data- 
base; 

wherein die selecting of die two-dimensional virtual video 
image is so as to best approximate showing the one 
object in die scene from the received spatial perspec- 
tive. 

16. The method according to daim 15 wherein the 
received spatial pers|>ective is static and fixed, during the 
video of the scene. 

17. The mediod according to daim 15 wherein the 
received spatial perspective is dynamic, and variable, during 
the video of die scene. 

18. The method according to daim 15 whorein the 



spatial location multiple two-dimensional images of a received spatial perspective is so dynamic and variable 



real-wcH-ld scene each at a different spatial perspective; 
combining in a computer the mult^le two-dimensional 
images of the scene into a three-dimensional modd of 
the scene so as to generate a three-dimensional modd 
of the scene in which model events in the scene arc 
identified; 

recdving in the ooii^>uter from a proqiective viewer oi 
the scene a viewer-specified criterion of a sdected 
event that the viewer wishes to paiticulariy view the 
scene; 

sdecting in the counter from the three-dimensional 
modd In acccrdanoe with die viewer-specified criterion 
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dependent ispon occurrences in the scene. 

19, The method according to daim 13 diat, between the 
creatii^ and the sdecting, further comprises: 

locating a selected object in the scene as is imaged within 

the three-dimensional video database; 
whcrda the selecting of the two-dimensional virtual video 

image is so as to best show die selected object 

20. The method according to daim 13 diat between the 
creating and the sdecting, further comprises: 

dynamically tracking die scene as is imaged within the 
three-dixnensional video database in order to recognize 
any occurrence of a predetermined event in the scene; 
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wherein the selecting of the two-dimeasiooal virtual video 
image is so as to best show the predetermined event 

21. The method according to claim 13 wherein the select- 
ing of the two-dimensional virtual video image is on 
demand. 

21 The method according to claim 13 wh»ein the select- 
ing of the two-dimensional video image is in real time on 
demand, thus interactive television. 

23. A system for presenting video images of a real-world 
scene containing a plurality of objects in accordance with a 
predetermined criterion, the system comprising: 
multiple video imagers each at a different spatial location 
for producing multiple two-dimensional video images 
of the real-WOTld scene each at a different spatial 
perspective; 

a viewer interface at which a prospective viewer of the 
scene may specify a criterion designating a particular 
one of tiie plurality of objects relative to which par- 
ticular one object in the scene the viewer wishes to 
view the scene; 
a compute, receiving the nmltiple two-dimensional video 
images of the scene from the multiple video imagers 
and the viewer-specified criterion from the viewer 
interface, , 
for producing &om the multiple two-dimensional video 

images of the scene a three-dimensional model of the 

scene; and 

for selecting from the three-dimensional model a par- 
ticular two-dimensional video image of the scene 
showing the viewer-selected object in accordance 
with the viewer-specified criterion; and 
video display, receiving the particular two-dimensional 
video image of the scene from the computer, for 
displaying the particular two-<iimensional video image 
of the real-world scene showing the viewer-selected 
object to the viewer. 

24. The system according to daim 23 wherein the mul- 
tiple video imagers comprise: 

multiple video cameras, each having an orientation and a 
lens parameter and a location that is separate from all 
other video cameras, each for producing a raw video 
image; and 

a camera scene builder computer, receiving tiic multQ)lc 
raw video images from the multiple video cameras, for 
selecting in consideration of the coientation, the lens 
parameter, and the location of each of the multiple 45 
video cameras, two-dimensional video images of a 
real-world scene that are of a known spatial 
relationship, as well as at a different spatial perspective, 
one to the next; 

wherein the spatial positions of all the all the multiple yj 
two-dimensional video images of a real-world scene 
are known. 

25. The system according to claim 23 

wherein the viewer interface has and presents a three- 
dimensional cursor manipulatable by a prospective 
viewer of the scene so as to unambiguously spedfy any 
object in the scene even when the specified object is 
partially obscured by otiier objects in the scene. 

26. A method of building a three-dimensional video 
model of a three-dimensional real-world scene, and ctf 60 
extracting video information r^arding the real worid scene 
from the model built, the method comfffising: 

imaging in multiple video cameras multiple frames of 
two-dimensional video of the three-dimensional real 
world scene, die two-dimensional frames from each 
camera arising from a unique spatial perspective on the 
scene; 
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first-analyzing the scene in two dimensions by extracting 
feature points from the two-dimensional video frames 
in order to annotate the two-dimensional video frames 
by certain image information contained therein, thus 
producing multiple annotated two-dimensional video 
frames; 

second-analyzing in a computer the scene in three dimra- 
sions by 

transforming the multiple annotated two-dimensional 
video frames into a three-dimensional video model 
in which model is contained three-dimensional video 
of the scene, while 
extracting and coirelating information from the anno- 
tated two-dimensional video frames so as to annotate 
die three-dimensional video model of the scene with 
such information, thus producing a three- 
dimensional video model annotated with scene 
image information, thus producing an annotated 
three dimensional video model; 
selecting in a oon^uter from the annotated three- 
dimensional video model (i) a two-dimensional video 
image (ii) in accordance with some criterion interpret- 
able and interpreted by reference to the scene image 
information, thus producing a selected two- 
dimensional video image; and 
displaying in a display the selected two-dimensional 
video image; 

wherein frames from tmiltiple video cameras were first- 
analyzed in order to produce the annotated two- 
dimensional video frames; 

wherein the annotated two-<limcttsional video frames 
were themselves second-analyzed to produce the anno- 
tated three-dimensional video model; 

wherein tiie interpreting, in the selecting step, of the 
criterion by reference to the three-dimensional scene 
image information is thus, ultimately, an interpretation 
by reference to scene image information that arose 
from multiple video cameras; 

wherein the image displayed is selected by reference to 
scene image information that is arose from more than 
just one video camera, and, indeed, is selected by 
reference to scene image infonnation that arose from 
multq)lc video camims. 

27. The method according to claim 26 

wherein the imaging is of the three-dimensional real 
wodd scene having coordinates (x,y^) by multiple 
cameras each having reference frame coordinates (p,q, 
s) tiiat are different than are the camera reference frame 
coordinates of any other camera so as to produce 
multiple frames of two-dimensional video each having 
coordinates (p,q); 

whaein the first-analyzing extracts feature points of coor- 
dinates (Po»qo) from the two-dimensional video frames; 

wherein the second-analyzing serves to produce die three- 
dimensional video model of the scene 
by transfOTming a point (x,y,z) in the worid coordinate 
system to a point (p,q,s) in the camera coordinate 
system by 
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where R is a transformation matrix from Ac world 
coordinate system to Hic camera coordinate system, 
and (Xo,yo.Zo) ^ ^ position of the camera, and 
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by projecting a pomt (p,q.s) in Ifac camen coordinate 
system to a point (u,v) on the image plane aoconUng 
by 

[:]■*[:] 

where f is camera parameter Hut determines the 
degree of zoom in or zoom out; 
wherein an image coordiDate (u.v) that coiresponds to ]o 
worid coordinate (x,y>z) is determined depending on 
the (i) camera position, (H) camera angle and (ii) 
camera parameter. 
2&. The method according to daim 27 that, a first step, 
iiirthcr con^iiscs: 
calibrating each of the muttq>Ie cameras by 
observing a known point, 

knowing thmby the observed point a pair of image 
coordinates and corresponding world coordinates, 

applying this known pair to the equations of claim 28 
so as to obtain two equations regarding the seven ^ 
parameters that detemtine camera status, 

repeating the observing, the knowing and the inlying 
for at least four known points so as to, die miniTrinin 
equations to solve the seven unknown parameters 
thus being provided, solve the equations and cali- 25 
bratc the camera cocvdinatc system (p,q,s) to the 
world coordinate system (x,y^). 
29. The method according to claim 27 
wherein the transforming a point (x,y^) in the world 

coordinate system to a point (p,q,s) in the camera 
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coordinate system, and the projecting of die point 
(p,q,s) in the camera coordinate system to a point (u.v) 
on tiie image plane, assumes^ a sinq)lifying assumption, 
that all points (u,v) are constrained to lie in a plane. 
30. A method of presenting to a viewer a particular 
two-dimensional video image of a real-world three dimen- 
sional scene containing a moving object, the method com- 
prising: 

imaging in multiple video cameras each at a different 
spatial location multiple two-dimensional images of the 
real-wodd scene each at a differei^ spatial pa^icctivc, 
not all of which different scene perspectives always and 
invariably show the object as it moves; 

combining in a con^^uter the nuiltiple two-dimensional 
images of the scene into a three-dimensional model of 
the scene containing the sconces moving object; 
selecting in tiie computer from the three-dimensional 
model a particular two-dimensional image of the scene 
that, out of all die actual images of the scene as were 
imaged by all the multiple video cameras, most closely 
shows the moving object; and 

displaying in a video display the particular two- 
dimensional image of die real- world scene showing the 
moving object. 

♦ ♦ ♦ ♦ * 
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